RANDOM MATRICES: 
UNIVERSALITY OF LOCAL EIGENVALUE STATISTICS 



TERENCE TAG AND VAN VU 



Abstract. In this paper, we consider the universahty of the local eigenvalue 
statistics of random matrices. Our main result shows that these statistics are 
determined by the first four moments of the distribution of the entries. As 
a consequence, we derive the universality of eigenvalue gap distribution and 
fe-point correlation and many other statistics (under some mild assumptions) 
for both Wigner Hermitian matrices and Wigner real symmetric matrices. 
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1. Introduction 



1.1. Wigner matrices and local statistics. The goal of this paper is to establish 
a universality property for the local eigenvalue statistics for random matrices. To 
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simplify the presentation, we are going to focus on Wigner Hermitian matrices, 
which are perhaps the most prominent model in the field. We emphasize however 
that our main theorem (Theorem 1 15|) is stated in a much more general setting, and 
can be applied to various other models of random matrices (such as random real 
symmetric matrices, for example). 

Definition 1 (Wigner matrices). Let n be a large number. A Wigner Hermitian 
matrix (of size n) is defined as a random Hermitian n x n matrix M„ with upper 
triangular complex entries dj := + \/—lTij {1 < i < j < n) and diagonal real 
entries {I < i < n) where 

• For 1 < i < j < n, S,ij,Tij are iid copies of a real random variable ^ with 
mean zero and variance 1/2. 

• For 1 < i < n, di are iid copies of a real random variable ^ with mean zero 
and variance 1. 

• £,,£, have exponential decay, i.e., there are constants C, C such that P{\£,\ > 
t^) < exp(-i),P(|^| > t^) < exp(-i), for aU t > C . 

We refer to as the atom distributions of M„, and ^ij, Tij as the atom variables. 
We refer to the matrix Wn '■= -^Mn as the coarse-scale normalized Wigner Her- 
mitian matrix, and An :— y/nMn as the fine- scale normalized Wigner Hermitian 
matrix. 

Example 2. An important special case of a Wigner Hermitian matrix is the gauss- 
ian unitary ensemble (GUE), in which ^, ^ are gaussian random variables with mean 
zero and variance 1/2, 1 respectively. The coarse-scale normalization Wn is con- 
venient for placing all the eigenvalues in a bounded interval, while the fine-scale 
normalization An is convenient for keeping the spacing between adjacent eigenval- 
ues to be roughly of unit size. 

Given an n x n Hermitian matrix A, we denote its n eigenvalues as 

Al(^) < ... < \n{A), 

and write \{A) := (Ai(A), . . . , Xn{A)). We also let ui{A), u„(A) e C" be an 
orthonormal basis of eigenvectors of A with Aui{A) = Xi{A)ui{A); these eigenvec- 
tors Ui{A) are only determined up to a complex phase even when the eigenvalues 
are simple, but this ambiguity will not cause a difficulty in our results as we will 
only be interested in the magnitude \ui{A)* X\ of various inner products Ui{A)*X 
of Ui{A) with other vectors X. 

The study of the eigenvalues Xi{Wn) of (normalized) Wigner Hermitian matrices 
has been one of the major topics of study in random matrix theory. The properties 
of these eigenvalues are not only interesting in their own right, but also have been 
playing essential roles in many other areas of mathematics, such as mathematical 
physics, probability, combinatorics, and the theory of computing. 

It will be convenient to introduce the following notation for frequent events de- 
pending on n, in increasing order of likelihood: 
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Definition 3 (Frequent events). Let E be an event depending on n. 

• E holds asymptotically almost surely PiE) = 1 — o(l). 

• E holds with high probability if P(£') > 1 — 0(n^^) for some constant c > 0. 

• E holds with overwhelming probability if F{E) > 1 — Oc{n^'~^) for every 
constant C > (or equivalently, that P(£') > 1 — exp(— cj(log n))). 

• E holds almost surely if P{E) = 1. 

Remark 4. Note from the union bound that the intersection of 0{n'^''^^) many 
events with uniformly overwhelming probability, still has overwhelming probability. 
Unfortunately, the same is not true for events which are merely of high probability, 
which will cause some technical difficulties in our arguments. 



A cornerstone of this theory is the Wigner semicircular law. Denote by p^c the 
semi-circle density function with support on [—2,2], 



(1) Psc{x) := 



2^V4^, \x\<2 
0, |a;|>2. 



Theorem 5 (Semi-circular law). Let Mn be a Wigner Her mitian matrix. Then for 
any real number x, 

lim <i<n: A,(W„) < x}\ ^ f p,,{y) dy 

in the sense of probability ( and also in the almost sure sense, if the Mn are all 
minors of the same infinite Wigner Hermitian matrix), where we use \I\ to denote 
the cardinality of a finite set I . 

Remark 6. Wigner[48 proved this theorem for special ensembles. The general 
version above is due to Pastur [36^ (see [H [T] for a detailed discussion) . The semi- 
circular law in fact holds under substantially more general hypotheses than those 
given in Definition [Tl but we will not discuss this matter further here. One conse- 
quence of Theorem [5] is that we expect most of the eigenvalues of Wn to lie in the 
interval (—2 + e, 2 + e) for e > small; we shall thus informally refer to this region 
as the bulk of the spectrum. 



Several stronger versions of Theorem [5] are known. For instance, it is known (see 
G-g- [3], 15) that asymptotically almost surely, one has 

(2) \j{Wn)=t(j-^ +0{n-') 

for all 1 < j < n and some absolute constant S > 0, where — 2 < i(a) < 2 is defined 
by the formula 

rt{a) 

(3) J '°^c(a;) dx. 
In particular we have 

(4) sup |A,(M„)| =2V^(l + o(l)) 

l<i<n 



^See Section 1 1.71 for our conventions on asymptotic notation. 
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asymptotically almost surely (see |4j for further discussion). 

Theorem [S] addressed the global behavior of the eigenvalues. The local properties 
are much harder and their studies require much more sophisticated tools. Most of 
the precise theorems have been obtained for the GUE, defined in Example [2] In 
the next few paragraphs, we mention some of the most famous results concerning 
this model. 



1.2. Distribution of the spacings (gaps) of the eigenvalues of GUE. In this 
section M„ is understood to have the GUE distribution. 

For a vector x — (xi, . . . , x„) where xi < X2 ■ ■ ■ < Xn, define the normalized gap 
distribution Sn{s]x) by the formula 

Sn{s]x) := <i <n : Xi+i ~ Xi < s}\. 

n 

For the GUE ensemble it is known|35 that 

(5) lim E5„(s,A(A„)) = / p{<j)da, 

where An '.— \pnMn is the fine-scale normalization of Af„, and p((7) is the Gaudin 
distribution, given by the formula 



d^ 

P{s) ^ det(/ - Js:)i2(o^), 

where K is the integral operator on i^((0, s)) with the Dyson sine kernel 

tc\ T^f \ sin7r(x - y) 

(6) K{x,y) := — . 

7r(x - y) 



In fact a stronger result is known in the bulk of the spectrum. Let /„ be any 
sequence of numbers tending to infinity such that tends to zero. Define 

1 s In 
(7) Sn{s;x,u) := — 1{1 < i < n : x^+i - x^ < -— , \xi - nu\ < — ^}|. 

U Psc(U) Psc(U) 

It is proved in [T^ that for any fixed — 2 < u < 2, we have 



(8) lim ES'„(s;A(A„),u) = / pia)da. 

n^oo 

The eigenvalue gap distribution has received much attention in the mathematics 
community, partially thanks to the fascinating (numerical) coincidence with the 
gap distribution of the zeros of the zeta functions. For more discussions, we refer 
to [m [29l 111] and the references therein. 
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1.3. /c-point correlation for GUE. Given a fine-scale normalized Wigner Her- 
mitian matrix An , we can define the symmetrized distribution function : M" — > 
to be the symmetric function on n variables such that the distribution of the 
eigenvalues X{A„) is given by the restriction of nlpl^\xi, . . . , Xn) dxi . . .dxn to 
the region {xi < ... < x„}. For any I < k < n, the k-point correlation function 
pi\^ : M*^ — > M+ is defined as the marginal integral of pi"-* : 

P^n\xi,...,Xk) := / pl")(x)da;fc+i ...dxn. 

(n- ky. J^^-k 

In the GUE case, one has an explicit formula for pi"^ , obtained by Ginibre [53] : 



— n/2 1 

(9) Pi"H^):--(5r n l=^.-^.Pexp(--(a:? + ...+.x2)), 

l<i<j<n 

(2) 

where Zn > is a normalizing constant, known as the partition function. From 
this formula, one can compute p„ explicitly. Indeed, it was established by Gaudin 
and Mehta [34] that 

(10) . . . ,Xfc) = Aci{Kn(xi, Xj))i<ij<k 
where the kernel Kn{x, y) is given by the formula 

n— 1 

and ho, ... , /in-i are the first n Hermite polynomials, normalized to be orthonormal 
with respect to e^^ dx. From this and the asymptotics of Hermite polynomials, it 
was shown by Dyson [13j that 

(11) lim '^pl^'>{nu+—!^,...,nu+^^) = det{K{t,,tj))i<ij<k, 

PscWr Psc[U) Psc[U) - - 

for any fixed —2 < u < 2 and real numbers ti, . . . ,tk, where the Dyson sine kernel 
K was defined in (El). 



1.4. The universality conjecture and previous results. It has been conjec- 
tured, since the 1960s, by Wigner, Dyson, Mehta, and many others, that the local 
statistics (such as the above limiting distributions) are universal, in the sense that 
they hold not only for the GUE, but for any other Wigner random matrix also. 
This conjecture was motivated by similar phenomena in physics, such as the same 
laws of thermodynamics, which should emerge no matter what the details of atomic 
interaction. 

The universality conjecture is one of the central questions in the theory of random 
matrices. In many cases, it is stated for a specific local statistics (such as the 
gap distribution or the fc-point correlation, see [331 page 9] for example). These 
problems have been discussed in numerous books and surveys (see [331 ITS] ). 
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Despite the conjecture's long and distinguished history and the overwhelming sup- 
porting numerical evidence, rigorous results on this problem for general Wigner 
random matrices have only begun to emerge recently. At the edge of the spectrum, 
Soshnikov|41) proved the universality of the joint distribution of the largest k eigen- 
values (for any fixed k), under the extra assumption that the atom distribution is 
symmetric: 

Theorem 7. '41^ Let k be a fixed integer and Mn be a Wigner Hermitian ma- 
trix, whose atom distribution is symmetric. Set Wn '■— -^Mn. Then the joint 
distribution of the k dimensional random vector 

((A„(W„) - 2)n^l\ (A„_fe(W„) - 2)^2/3) 

has a weak limit as n ^ oo, which coincides with that in the GUE case. The result 
also holds for the smallest eigenvalues Ai, . . . , A^. 



Note that this significantly strengthens ^ in the symmetric case. (For the non- 
symmetric case, see [38], |39j for some recent results). 

Returning to the bulk of the spectrum, Johansson [27 proved (fTT|) and ([8]) for 
random Hermitian matrices whose entries are gauss divisible. (See also the paper [7] 
of Ben Arous and Peche where they discussed the removal of a technical condition in 
[27].) More precisely, Johansson considered the model M„ = {l-tY/'^M}^ + t'^/'^Ml, 
where < i < 1 is fixed (i.e. independent of n), is a Wigner Hermitian matrix 
and is a GUE matrix independent of M^. We will refer to such matrices as 
Johansson matrices. 

Theorem 8. 07] 

(|lip (in the weak sense) and ^ (and hence ^) hold for Johansson matrices, as 
n — >■ oo. By "weak sense", we mean that 

1 /" ti tk 

(12) Psciur Jr" Psc{u) Psc(u) 

f(ti, ...,tk) det{K{ti,tj))i<ij<k dti... dtk 



for any test function f G Cc(K'^). 



The property of being gauss divisible can be viewed as a strong regularity assump- 
tion on the atom distribution. Very recently, Erdos, Peche, Ramirez, Schlein and 
Yau [5D| , [U have relaxed this regularity assumption significantly. In particular in 
[M] an analogue of Theorem [5] (with fc = 2 for the correlation and In polynomial in 
n for the gap distribution) is proven assuming that the atom distribution is of the 
form 



iydx = e-^'-''^e-'''dx 

where V{x) G and XlLi 1^^ < C(l x^)'^ and u{x) < C exp(-Sx'^), for 
some fixed k,6,C,C'. It was remarked in [21] that the last (exponential decay) 
assumption can be weakened somewhat. 
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We note that both Erdos et al and our approach make important, but different 
use of the ideas and resuhs by Erdos, Schlein and Yau ([17,18,19]) concerning the 
rate of convergence to the semi-circle law. (See page 24 and Section 5 for a detailed 
description.) 

Finally, let us mention that in a different direction, universality was established by 
Deift, Kriecherbauer, McLaughlin, Venakides and Zhou, [16 , Pastur and Shcherbina 
|37) , Bleher and Its for a different model of random matrices, where the joint 
distribution of the eigenvalues is given explicitly by the formula 



(13) pl^\xi,...,Xn) ■=Cn Y\_ \xi - x-il'^ exp{-V{x)), 

l<i<j<n 

where ^ is a general function and c„ > is a normalization factor. The case 
V = x'^ corresponds to For a general V, the entries of the matrix are corre- 
lated, and so this model differs from the Wigner model. (See [35] for some recent 
developments concerning these models, which are studied using the machinery of 
orthogonal polynomials.) 

One of the main difficulties in establishing universality for general matrix ensem- 
bles lies in the fact that most of the results obtained in the GUE case (and the 
case in Johansson's theorem and those in [16[ 1371 E]) came from heavy use of the 
explicit joint distribution of the eigenvalues such as ^ and ([T3l) . The desired lim- 
iting distributions were proved using estimates on integrals with respect to these 
measures. Very powerful tools have been developed to handle this task (see [331114] 
for example), but they cannot be applied for general Wigner matrices where an 
explicit measure is not available. 

Nevertheless, some methods have been developed which do not require the explicit 
joint distribution. For instance, Soshnikov's result [41J was obtained using the (com- 
binatorial) trace method rather than from an explicit formula from the distribution, 
although it is well understood that this method, while efficient for the studying of 
the edge, is of much less use in the study of the spacing distribution in the bulk of 
the spectrum. The recent argument in 20 also avoid explicit formulae, relying in- 
stead on an analysis of the Dyson Brownian motion, which describes the stochastic 
dynamics of the spectrum of Johansson matrices M„ = (1 — tY^'^M^ + t^^'^M^ in 
the t variable. (On the other hand, the argument in [2T| uses explicit formulae for 
the joint distribution.) However, it appears that their method still requires a high 
degree of regularity on the atom distribution, whereas here we shall be interested 
in methods that do not require any regularity hypotheses at all (and in particular 
will be applicable to discrete atom distributiontjj. 



Subsequently to the release of this paper, we have realized that the two methods can in fact 
be combined to address the gap distribution problem and the fc-point correlation problem even 
for discrete distributions without requiring moment conditions; see |22| for details. 
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1.5. Universality theorems. In this paper, we introduce a new method to study 
the local statistics. This method is based on the Lindeberg strategy |32j of replacing 
non-gaussian random variables with gaussian ones. (For more modern discussions 
about Lindeberg's method, see |8l|40].) Using this method, we are able to prove 
universality for general Wigner matrices under very mild assumptions. For instance, 
we have 

Theorem 9 (Universality of gap). The limiting gap distribution ([5]) holds for 
Wigner Hermitian matrices whose atom distribution ^ has support on at least 3 
points. The stronger version ([5]) holds for Wigner Hermitian matrices whose atom 
distribution ^ has support on at least 3 points and the third moment E^"^ vanishes. 

Remark 10. Our method also enables us to prove the universality of the variance 
and higher moments. Thus, the whole distribution of S'„(s, A) is universal, not only 
its expectation. See Remark [311 

Theorem 11 (Universality of correlation). The k-point correlation (fTTj) (in the 
weak sense) holds for Wigner Hermitian matrices whose atom distribution ^ has 
support on at least 3 points and the third moment E^*^ vanishes. 

These theorems (and several others, see Section [TT6|) are consequences of our more 
general main theorem below (Theorem [T5|). Roughly speaking. Theorem [TSl states 
that the local statistics of the eigenvalues of a random matrix is determined by the 
first four moments of the atom distributions. 

Theorem [15] applies in a very general setting. We will consider random Hermitian 
matrix M„ with entries obeying the following condition. 

Definition 12 (Condition CO). A random Hermitian matrix An ~ {Cij)i<i.j<n is 
said to obey condition CO if 

• The Qj are independent (but not necessarily identically distributed) for 
^ l£ ^ l£ j ^ n, and have mean zero and variance 1. 

• (Uniform exponential decay) There exist constants C, C" > such that 

(14) P(|C.,| >i^)<exp(-t) 

for all t > C and 1 < i, j < n. 

Clearly, all Wigner Hermitian matrices obey condition CO. However, the class of 
matrices obeying condition CO is much richer. For instance the gaussian orthogonal 
ensemble (GOE), in which dj = N{0,1) independently for all i < j and Cu = 
A^(0,2), is also essentially of this forrr0, and so are all Wigner real symmetric 
matrices (the definition of which is given at the end of this section) . 

Definition 13 (Moment matching). We say that two complex random variables ( 
and ^' match to order k if 

ERe(C)™Im(C)' - ERe(C')'"Ini(C')' 

■^Note that for GOE an diagonal entry has variance 2 rather than 1. We thank Sean O'Rourke 
for pointing out this issue. On the other hand, Theorem 115 1 still holds if we change the variances 
of the diagonal entries, see Remark [16] 
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for all m,l > such that m + I < k. 

Example 14. Given two random matrices yl„ = (Cij)i<ij<n and = (Ci'j)i<ij<n 
obeying condition CO, and Qj automatically match to order 1. If they are 
both Wigner Hermitian matrices, then they automatically match to order 2. If 
furthermore they are also symmetric (i.e. An has the same distribution as — A„, 
and similarly for and —A'^) then and Cij automatically match to order 3. 

Our main result is 

Theorem 15 (Four Moment Theorem). There is a small positive constant cq such 
that for every < e < 1 and k > 1 the following holds. Let M„ = iCij)i<i,j<n 
and — {Cij)i<i.j<n be two random matrices satisfying CO. Assume furthermore 
that for any \ < i < j < n, Qj and match to order 4 and for any 1 < i < n, Ca 
and Cj'j match to order 2. Set An :— y/nAIn and A!^ ^/nMn, and let G -.R^ 
be a smooth function obeying the derivative bounds 

(15) |V^G(a;)| < n""" 

for all < j < 5 and x € R^ . Then for any en < ii < ^2 ■ ■ • < ifc < (1 ~ and 
for n sufficiently large depending on e, k (and the constants C, C in Definitionll^) 
we have 

(16) |E(G(A,, (An), . . . , A., {An))) - E(G(A,, (A'J, . . . , A., {A'„)))\ < n'^^ 

If C^ij and Cj'j only match to order 3 rather than 4, then there is a positive constant 
C independent of co such that the conclusion (|16p still holds provided that one 
strengthens (|15p to 

|V^G(x)| < n-^'^" 

for all < j < 5 and x eR^. 

The proof of this theorem begins at Section 13.31 As mentioned earlier. Theorem 
[15] asserts (roughly speaking) that the fine spacing statistics of a random Hermitian 
matrix in the bulk of the spectrum are only sensitive to the first four moments of 
the entries. It may be possible to reduce the number of matching moments in this 
theorem, but this seems to require a refinement to the method; see Section [3. 2 1 for 
further discussion. 

Remark 16. Theorem [15] still holds if we assume that the diagonal entries (a 
and (^j'j have the same mean and variance, for all 1 < i < n, but these means and 
variances can be different at different i. The proof is essentially the same. In our 
analysis, we consider the random vector formed by non-diagonal entries of a row, 
and it is important that these entries have mean zero and the same variance, but 
the mean and variance of the diagonal entry never plays a role. Details will appear 
elsewhere. 

Remark 17. In a subsequent paper [47], we show that the condition en < ii < 
12 • • • < ife < (1 — s)n can be omitted. In other words. Theorem [TSl also holds for 
eigenvalues at the edge of the spectrum. 

Applying Theorem [15] for the special case when is GUE, wc obtain 
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Corollary 18. Let Mn be a Wigner Hermitian matrix whose atom distribution ^ 
satisfies E^"^ — and E£_'^ — | and AI^ be a random matrix sampled from GUE. 
Then with G, An, A'^ as in the previous theorem, and n sufficiently large, one has 

(17) |E(G(A,, (A„), . . . , A., (A„))) - E(G(A,, {A'J, . . . , A., K)))| < n 



Co 



In the proof of Theorem 1151 the following lower tail estimate on the consecutive 
spacings plays an important role. This theorem is of independent interest, and will 
also help in applications of Theorem [TS] 

Theorem 19 (Lower tail estimates). Let < £ < 1 be a constant, and let Mn be a 
random matrix obeying Condition CO. Set An := y/nMn- Then for every co > 0, 
and for n sufficiently large depending on e, co and the constants G, G' in Definition 
and for each en < i < {1 — e)n, one has Ai+i(y4„) — Ai(yl„) > n^'^° with high 
probability. In fact, one has 

P(A,+i(A„) - KiAn) < n"=") < n~^^ 

for some ci > depending on cq (and independent of e). 



The proof of this theorem begins at Section 13.51 

1.6. Applications. By using Theorem [TS] and Theorem [TOl in combination with 
existing results in the literature for GUE (or other special random matrix ensem- 
bles) one can establish universal asymptotic statistics for a wide range of random 
matrices. For instance, consider the i'^ eigenvalue Ai(M„) of a Wigner Hermitian 
matrices. In the GUE case, Gustavsson[26]. based on |42j[T0], proved that Xi has 
gaussian fluctuation: 

Theorem 20 (Gaussian fluctuation for GUE). 26 Let i = i{n) be such that 
i/n^casn-^oo for some < c < 1. Let Mn be drawn from the GUE. Set 
An '.— \/nMn. Then 



A-t{i/nY\,{An)-t{i/n)n ^ 
2 Vlog" 

in the sense of distributions, where t{) is defined in ([3|). (More informally, we have 
UMn) « t{^/n)^ + 7V(0, (4 J('°/„>) J J 

As an application of our main results, we have 

Corollary 21 (Universality of gaussian fluctuation). The conclusion of Theorem 
\20[ also holds for any other Wigner Hermitian matrix Mn whose atom distribution 
^ satisfies E^^ ^ q and E^^ = |. 



Proof. Let M„ be a Wigner Hermitian matrix, and let be drawn from GUE. 
Let z, c, t be as in Theorem [20l and let cq be as in Theorem [T9l In view of Theorem 
[20I it suffices to show that 

(18) P{X.{A'n) e /_) - < P(A,(AO e /) < P(A,(a;) e 1+) + n~'° 
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for all intervals / = [a, 6], and n sufBciently large depending on i and the constants 
C, C in Definition[Tl where /+ [a-n-=«/i°, h+W^o/^^] and /_ := [a+n^'^o/io, b- 

^-co/lOj^ 

We will just prove the second inequality in ((T8)) . as the first is very similar. We 
define a smooth bump function G : M — )• equal to one on /_ and vanishing 
outside of /+. Then we have 

P(A,(A„)e/)<EG(A,(v4„)) 

and 

EG(A,K)) <P(A,(A;)e/) 
On the other hand, one can choose G to obey Thus by Corollary [TH] we have 

|EG(A,(A„)) - EG(A,K))| < n-^" 

and the second inequality in (|18l) follows from the triangle inequality. The first 
inequality is similarly proven using a smooth function that equals 1 on and 
vanishes outside of /. □ 

Remark 22. The same argument lets one establish the universality of the asymp- 
totic joint distribution law for any k eigenvalues Ai^ (M„), . . . , A^^ (M„) in the bulk 
of the spectrum of a Wigner Hermitian matrix for any fixed k (the GUE case is 
treated in [26] ). In particular, we have the generalization 

(19) 

P(A,^ (A;) e /j, _ for all 1 < J < fc) + Ok{n-^°) 

< P{Xi^ (An) e Ij for all 1 < j < fc) 

< P(A,^ (A'J e for aU I < j < k) + Ok{n''°) 

for all ii, . . . ,ik between sn and (1 — e)n for some fixed e > 0, and all intervals 
/i, . . . assuming n is sufficiently large depending on e and k, and C Ij C 
are defined as in the proof of Corollary [2TJ The details are left as an exercise 
to the interested reader. 



Another quantity of interest is the least singular value 

a„(M„):= inf |A,(M„)| 

l<i<n 

of a Wigner Hermitian matrix. In the GUE case, we have the following asymptotic 
distribution: 

Theorem 23 (Distribution of least singular value of GUE). [V, Theorem 3.1.2], 
[28] For any fixed t > 0, and Mn drawn from GUE, one has 

P(^„(Af„) < -^) ^ exp( /* ^dx) 

Zy/n Jq X 

as n ^ oo, where / : M — > M is the solution of the differential equation 

{tf'r+m'-fm'~f+ifT)^o 

with the asymptotics f {t) = ^ — ^ — ^ + 0{t'^) as t ^ 0. 
Using our theorems, we can extend this result to more general ensembles: 
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Corollary 24 (Universality of the distribution of the least singular value). The 
conclusions of Theorem \23\ also hold for any other Wigner Hermitian matrix M„ 
whose atom distribution ^ has E^"^ = and E^* — | . 

Proof. Let Af„ be a Wigner Hermitian matrix, and let be drawn from GUE. 
Let N] be the number of eigenvalues of in an interval /. It is well known (see 
[H Chapter 4]) that 

(20) " / Psc{x)dx + 0{\og n) 

asymptotically almost surely (cf. ^ and Theorem I^U)) . Applying this fact to the 
two intervals / = [^oo, ±77-^!, we conclude that 

for either choice of sign ±. Using p8| (or (|T9)) ) (and modifying t slightly), we 
conclude that the same statement is true for Af„. In particular, we have 

P(a„(Af„) > -^) - P(A.(Mn) < --^AA,+i(Af„) > -^)+o{l) 

2\/n ^ — ' ■i\/n 

^ n/2-log2 n<i<n/2+log2 T! ^ ^ 

and similarly for Af/^. Using (flQl) . we see that 



P(A.(Af„)<-^AA.(A/„)>-^) 

< p(a.(A<) < - ^ + A A,+i(Af;) > ^ - n~^«/'") + 0{n- 

and 



P(A.(A./„)<-^AA.(Af„)>-^) 



> p(A.(Af;) < - ^ - ""'"/'^ A A,+i(A<) > ^ + «-^°/'°) - 0(«-^") 
for some c > 0. Putting this together, we conclude that 



P(a„(Af;) n--«'^^) + 0(1) < P(a„(A/„) > * 



<P(a„(A/;)>-^+n-«A")+o(l) 

and the claim follows. □ 

Remark 25. A similar universality result for the least singular value of non- 
Hermitian matrices was recently established by the authors in |46| . Our arguments 
in [33] also used the Lindeberg strategy, but were rather different in many other 
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respects (in particular, they proceeded by analyzing random submatrices of the in- 
verse matrix M^^). One consequence of Corollary [24] is that Af„ is asymptotically 
almost surely invertible. For discrete random matrices, this is already a non-trivial 
fact, first proven in If Theorem [23] can be extended to the Johansson matrices 
considered in [27] , then the arguments below would allow one to remove the fourth 
moment hypothesis in Corollary [24] (assuming that ^ is supported on at least three 
points). 

Remark 26. The above corollary still holds under a weaker assumption that the 
first three moments of ^ match those of the gaussian variable; in other words, we 
can omit the last assumption that E^'' = 3/4. Details will appear else where. 

Remark 27. By combining this result with ([4]) one also obtains a universal dis- 
tribution for the condition number ai(M„)/(T„(M„) of Wigner Hermitian matrices 
(note that the non-independent nature of tTi(M„) and (t„(M„) is not relevant, be- 
cause (U]) gives enough concentration of (7i(Af„) that it can effectively be replaced 
with 2y/n). We omit the details. 

Now we are going to prove the first part of Theorem [9] Note that in contrast 
to previous applications, we are making no assumptions on the third and fourth 
moments of the atom distribution ^. The extra observation here is that we do 
not always need to compare A/„ with GUE. It is sufficient to compare it with any 
model where the desired statistics have been computed. In this case, we are going 
to compare Mn with a Johansson matrix. The definition of Johansson matrices 
provides more degrees of freedom via the parameters t and M^, and we can use 
this to remove the condition of the third and fourth moments. 

Lemma 28 (Truncated moment matching problem). Let ^ be a real random vari- 
able with mean zero, variance 1, third moment E^'^ = a^, and fourth moment 
E^^ = ai < 00. Then a^ — a\ — \ > 0, with equality if and only if ^ is supported on 
exactly two points. Conversely, if — — 1 > 0, then there exists a real random 
variable with the specified moments. 



Proof. For any real numbers a, b, we have 

< E(^^ + + bf = 04 + 2aa3 + + 2& + 6^. 

setting b := —1 and a := —as we obtain the inequality 04 — a| — 1 > 0. Equality 
only occurs when E(^^ — 0:3^ — 1)^ — 0, which by the quadratic formula implies 
that ^ is supported on at most two points. 

Now we show that every pair (aa, a^) with — — 1 > arises as the moments 
of a random variable with mean zero and variance 1. The set of all such moments 
is clearly convex, so it suffices to check the case when — a\ — 1 =0. But if 
one considers the random variable ^ which equals tan 6 with probability cos^ 9 and 
— cot 9 with probability sin^ 6* for some — 7r/2 < 9 < n/2, one easily computes 
that ^ has mean zero, variance 1, third moment — 2cot(20), and fourth moment 
4cosec(26') — 3, and the claim follows from the trigonometric identity cosec(2(?)^ = 
cot(26')2 -1-1. □ 
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Remark 29. The more general truncated moment problem (i.e., the truncated 
version of the classical Hamburger moment sequence problem, see [9l|3^) was solved 
by Curto and Fialkow 0. 

Corollary 30 (Matching lemma) . Let ^ he a real random variable with mean zero 
and variance \, which is supported on at least three points. Then ^ matches to order 
4 with (l-t)i/2^' + ti/2^G for some <t < 1 and some independent ^',^g of mean 
zero and variance 1, where = N(0, 1) is Gaussian. 



Proof. The formal characteristic function Ee"'' X^jlo fl"-^^"' expansion 
1 + + ^a^s^ + ^a4S* + O(s^); by Lemma E51 we have 04 — a§ — 1 > 0. Observe 
that ^ will match to order 4 with (1 — f)^/^^' + i^^^^c if and only if one has the 
identity 



1 n 1 .1 . , l-t , (l-t)3/2 , o (l-t)2 , 4,, i 9 i 4x 



where 03, a'^^ are the moments of Formally dividing out by l + |s^ + |s'*, one can 
thus solve for ajj, a'^ in terms of (23, 04. Observe that as t — 0, ag, a'^ must converge 
to as, a4 respectively. Thus, for t sufficiently small, we will have a'^ — (ajj)^ — 1 > 0. 
The claim now follows from Lemma [2S1 □ 



Proof. (Proof of the first part of Theorem [S]) Let Mn be as in this theorem and 
consider ([5]). By Corollary [30l we can find a Johansson matrix which matches 
Mn to order 4. By Theorem [H ^ already holds for M^. Thus it will suffice to 
show that 

E5„(5; A(A„)) = ESn{s; X{A'J) + o(l). 
By ([7]) and linearity of expectation, it suffices to show that 

P(A,+ i(AO - X^iAn) < 5) = P(A,+i(^:,) - XM'n) < s) + o(l) 

uniformly for all en < i < {1 ~ e)n, for each fixed e > 0. But this follows by a 
modification of the argument used to prove (fT8|) (or (fT9|) ). using a function G{x,y) 
of two variables which is a smooth approximant to the indicator function of the 
half-space {y — x < s} (and using Theorem [19] to errors caused by shifting s); we 
omit the details. The second part of the theorem will be treated together with 
Theorem [m □ 

Remark 31. By considering P({Ai+i(A„) - Xi{An) < s} A {Aj+i(A„) - XjiA„) < 
s}) we can prove the universality of the variance of S'„(s, A). The same applies for 
higher moments. 



The proof of TheoremdHis a little more complicated. We first need a strengthening 
of ^ which may be of independent interest. 

Theorem 32 (Convergence to the semicircular law). Let M„ be a Wigner Hermit- 
ian matrix whose atom distribution ^ has vanishing third moment. Then for any 
fixed c > and e > 0, and any en < j < {1 ~ e)n, one has 

XjiWn) = t(^)+0{n-^+^) 
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asymptotically almost surely, where t{) was defined in ([3]). 



Proof. It suffices to show that 



til 



-1+c 



1 1 - 

n 



-1+c 



asymptotically almost surely. Let Af,j be drawn from GUE, thus the off-diagonal 
entries of M„ and M' match to third order. From (l20l) we have 



-1+c 



/2,t 



-1+c 



/2 



asymptotically almost surely. The claim now follows from the last part of Theorem 
[T5l letting G = G'(Aj ) be a smooth cutoff equal to 1 on [t (^) n - 71+" /2,t {^) n + 71+" /2] 
and vanishing outside of \t (^) n — n'^'^, t (^) n + n'^'\. □ 



Proof of Theorem] 11[ Fix k,u, and let M„ be as in Theorem [TT] By Corollary [501 
we can find a Johansson matrix Al^ whose entries match Af„ to fourth order. By 
Theorem |S] (and a slight rescaling) , it suffices to show that the quantity 

(21) / f{ti,...,tk)pi^Hnu + ti,...,nu + tk) dti...dtk 

only changes by o(l) when the matrix Mn is replaced with M^, for any fixed test 
function /. By an approximation argument we can take / to be smooth. 

We can rewrite the expression (|21l) as 

(22) Yl E/(A,,(A„)-nu,...,A,,(A„)-nu). 

1<?1 , . . <n 

Applying Theorem 1151 '^^ already have 

E/(A,, (A„)-nu, . . . , A,, (A„)-nu) = E/(A,, {A'„)-nu, . . . , A,, {A'„)-nu)+0{n-^") 

for each individual ii,. . . ,ik and some absolute constant cq > 0. Meanwhile, by 
Theorem l32l we see that asymptotically almost surely, the only ii,...,ik which 
contribute to ([22|) lie within 0{n'^) of t~^{u)n, where c > can be made arbitrarily 
small. The claim then follows from the triangle inequality (choosing c small enough 
compared to cq). □ 



Proof of the second part of Theorem\^ This proof is similar to the one above. We 
already know that P(Ai+i — Xi < s) is basically the same in the two models (M„ 
and M'^). Theorem l32l now shows that after fixing a small neighborhood of u, the 
interval of indices i that involve fluctuates by at most n*^, where c can be made 
arbitrarily small. □ 

Remark 33. In fact, in the above applications, we only need Theorem 15^ to hold 
for Johansson matrices. Thus, in order to remove the third moment assumption, it 
suffices to have this theorem for Johansson matrices (without the third moment as- 
sumption). We believe this is within the power of the determinant process method, 
but do not pursue this direction here. 
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As another application, we can prove the fohowing asymptotic for the determinant 
(and more generally, the characteristic polynomial) of a Wigner Hermitian matrix. 
The detailed proof is deferred to Appendix [Xj 

Theorem 34 (Asymptotic for determinant). Let M„ be a Wigner Hermitian matrix 
whose atom distribution ^ has vanishing third moment and is supported on at least 
three points. Then there is a constant c > such that 

P(|log|detM„| - \ogV^.\ > n^-'') ^ o(l). 

More generally, for fixed any complex number z, one has 

P(| log I det(M„ - V^z/)| - -nlogn - n J log \y - z\psciy) dy\ > = o(l) 

where the decay rate o/o(l) is allowed to depend on z. 

Remark 35. A similar result was established for iid random matrices in [44] (see 
also [12 for a refinement), based on controlling the distance from a random vector 
to a subspace. That method relied heavily on the joint independence of all entries 
and does not seem to extend easily to the Hermitian case. We also remark that 
a universality result for correlations of the characteristic polynomial has recently 
been established in [21] . 

Let us now go beyond the model of Wigner Hermitian matrices. As already men- 
tioned, our main theorem also applies for real symmetric matrices. In the next 
paragraphs, we formulate a few results one can obtain in this direction. 

Definition 36 (Wigner symmetric matrices). Let n be a large number. A Wigner 
symmetric matrix (of size n) is a random symmetric matrix Af„ = {Cij)i<i,j<n 
where for 1 < i < j < n, are iid copies of a real random variable ^ with mean 
zero, variance 1, and exponential decay (as in Definition [T]), while for 1 < i = j < n, 
^ii are iid copies of a real random variable ^' with mean zero, variance 2, and 
exponential decay. We set Wn '■— -^Mn and An := \pnMn as before. 

Example 37. The Gaussian orthogonal ensemble (GOE) is the Wigner symmetric 
matrix in which the off-diagonal atom distribution ^ is the Gaussian A^(0, 1), and 
the diagonal atom distribution ^' is A^(0, 2). 

As remarked earlier, while the Wigner symmetric matrices do not, strictly speak- 
ing, obey Condition CO due to the diagonal variance being 2 instead of 1, it is not 
hard to verify that all the results in this paper continue to hold after changing 
the diagonal variance to 2. As a consequence, we can easily deduce the following 
analogue of Theorems |9] and [TT] 

Theorem 38 (Universality for Random Symmetric Matrices). The limiting gap 
distribution and k-correlation function of Wigner symmetric real matrices with 
atom variable a satisfying Ecr'^ = and Ecr* = 3 are the same as those for GOE. 
(The explicit formulae for the limiting gap distribution and k-correlation function 
for GOE can be found in [33[fT|. The limit of the k-correlation function is again in 
the weak sense.) 
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The proof of Theorem [38] is smiilar to that of Theorems l9l [Til and is omitted. The 
reason that we need to match the moments to order 4 here (compared to lower 
orders in Theorems [H] and [TT|) is that there is currently no analogue of Theorem [5] 
for the GOE. Once such a result becomes available, the order automatically reduces 
to those in Theorems [9| and [Til respectively. 

Finally let us mention that our results can be refined and extended in several 
directions. For instance, we can handle Hermitian matrices whose upper triangular 
entries are still independent, but having a non-trivial covariance matrix (the real 
and imaginary parts need not be independent). The diagonal entries can have 
mean different from zero (which, in the case the off-diagonal entries are gaussian, 
corresponds to gaussian matrices with external field and has been studied in [6]) 
and we can obtain universality results in this case as well. We can also refine our 
argument to prove universality near the edge of the spectrum. These extensions 
and many others will be discussed in a subsequent paper. 

1.7. Notation. We consider n as an asymptotic parameter tending to infinity. We 
use X < r, y > X, r = n{X), oiX = 0(Y) to denote the bound X < CY for ah 
sufficiently large n and for some constant C. Notations such as X <^k Y, X = Ok (Y) 
mean that the hidden constant C depend on another constant k. X = o{Y) or 
Y — uj{X) means that X/Y — > as n — >■ oo; the rate of decay here will be allowed 
to depend on other parameters. The eigenvalues are always ordered increasingly. 

We view vectors x e C" as column vectors. The Euclidean norm of a vector 
X e C" is defined as := {x*xy^'^. The Frobenius norm \\A\\f of a matrix 
is defined as \\A\\f — trace(AA*)^/^. Note that this bounds the operator norm 
||yl||op ■— sup{||v4a;|| : ||a;|| = 1} of the same matrix. We will also use the following 
simple inequalities without further comment: 

\\AB\\f < \\A\\f\\B\\op 

and 

\\B\\op < \\B\\f 

and hence 

\\AB\\f < \\A\\f\\B\\f. 



2. Preliminaries: Tools from linear algebra and probability 

2.1. Tools from Linear Algebra. It is useful to keep in mind the (Courant- 
Fisher) minimax characterization of the eigenvalues 

Xi(A) = minmax u* Au 

of a Hermitian n x n matrix A, where V ranges over i-dimensional subspaces of C" 
and u ranges over unit vectors in V . 

From this, one easily obtain Weyl's inequality 
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(23) K{A) - \\B\\op < K{A + B)< K{A) + \\B\ 



op- 



Another consequence of the minimax formula is the Cauchy interlacing inequality 
(24) A,(A„_i) < A,(v4„) < K+i{A„^i) 

for all 1 < I < n, whenever A„ is an n x n Hermitian matrix and ^n-i is the top 
n — 1 X 71 — 1 minor. In a similar spirit, one has 

k{A) <\^{A + B)< X,+iiA) 

for all 1 < i < n, whenever A, B are nxn Hermitian matrices with B being positive 
semi-definite and rank 1. If _B is instead negative semi-definite, one has 

HA) < X,+i{A + B) < A,;+i(A). 

In either event, we conclude 

Lemma 39. Let A, B be Hermitian matrices of the same size where B has rank 
one. Then for any interval I , 

\Ni{A + B)~Nj{A)\ < 1, 

where Ni{M) is the number of eigenvalues of M in I. 

One also has the following more precise version of the Cauchy interlacing inequal- 
ity: 

Lemma 40 (Interlacing identity). Let An be an nx n Hermitian matrix, let An-i 
be the top n — 1 x n — 1 minor, let Onn be the bottom right component, and let 
X G C"^^ be the rightmost column with the bottom entry ann removed. Suppose 
that X is not orthogonal to any of the unit eigenvectors Mj(A„_i) o/ Then 
we have 

^4^A,(A„_r)-A.(^„)-""" ^^^^"^ 

for every 1 < i < n. 

Proof. By diagonalising A^-i (noting that this does not affect either side of (US])), 
we may assume that An-i — diag(Ai(A„_i), . . . , A„_i(yl„_i)) and Uj{An-i) = Cj 
for J = l,...,n — 1. One then easily verifies that the characteristic polynomial 
det(^„ — XT) of An is equal to 

n — l 71 — 1 \ I A \* Y\'2 

n {^Mn-i) - A)[(a„„ - A) - 5: '"^- 7-^ ] 

J=l J=l ^ 

when A is distinct from Ai(A„_i), . . . , A„_i(A„_i). Since Uj{An-i)* X is non-zero 
by hypothesis, we see that this polynomial does not vanish at any of the Xj{An-i). 
Substituting Xi{An) for A, we obtain (pSj). □ 



The following lemma will be useful to control the coordinates of eigenvectors. 
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Lemma 41. [H] Let 



a X* 



be a n X n Hermitian matrix for some a G K. and X G C" ^ , and let \ be a unit 



eigenvector of An with eigenvalue Ai(yl„), where a; G C and v G C" ^. Suppose 
that none of the eigenvalues of An-i are equal to Ai(yl„). Then 



1 + E;=i (A,(A„^i) - A,(A„))-2|u,(A„_i)*X|2 
where Uj[An-i) is a unit eigenvector corresponding to the eigenvalue \j{An-i). 

Proof. By subtracting \i{A)I from A we may assume Xi{A) = 0. The eigenvector 
equation then gives 

xX + A„„iu = 0, 

thus 

V = -xA-\X. 
Since + |xp — 1, we conclude 

|xp(l + P;l,X||2) = l. 

Since WA'^^XW^ = ^JJ^^(Aj(A„_i))-2|it^.(A„_i)*X|2, the claim follows. □ 



The Stieltjes transform s„(z) of a Hermitian matrix W is defined for complex z 
by the formula 

1 " 1 
'"^^^ n|^A.(M/)-z- 

It has the following alternate representation (see e.g. [2] Chapter 11]): 

Lemma 42. Let W — {Cij)i<i,j<n be a Hermitian matrix, and let z be a complex 
number not in the spectrum of W . Then we have 



1 " 



'^fr[Ckk- z- al(Wk - zl)-^ak 

where Wk is the n — 1 x n — 1 matrix with the fc*'' row and column removed, and 
Ok G C"^"'^ is the fc*'* column of W with the k^^ entry removed. 



Proof. By Schur's complement, is the k^^ diagonal entry of 

{W — zl)^^ . Taking traces, one obtains the claim. □ 
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2.2. Tools from Probability. We will make frequent use of the following lemma, 
whose proof is presented in Appendix |B] This lemma is a generalization of a result 
in 

Lemma 43 (Distance between a random vector and a subspace). Let X = {£,1, ■ ■ ■ ,£,n) € 
C" be a random vector whose entries are independent with mean zero, variance 
1, and are bounded in magnitude by K almost surely for some K , where K > 
10(E|^|^ + 1). Let H be a subspace of dimension d and tth the orthogonal projec- 
tion onto H . Then 

Pi\\\nH{X)\\ -Vd\>t)< 10cxp(-^). 
In particular, one has 

\\nHiX)\\ ^Vd + OiK\ogn) 
with overwhelming probability. 

Another useful tool is the following theorem, which is a corollary of a more general 
theorem proved in Appendix |DJ 

Theorem 44 (Tail bounds for complex random walks) . Let 1 < N < n be integers, 
and let A — (ai.j)i<i<jv;i<j<n be an N x n complex matrix whose N rows are 
orthonormal in C", and obeying the incompressibility condition 

(26) sup Wijl < cr 

l<i<N;l<j<n 

for some a > 0. Let i^i, . . . , be independent complex random variables with mean 
zero, variance E|CjP equal to 1, and obeying E|Cip < C for some C >1. For each 
\ < i < N , let Si be the complex random variable 

n 

Si := '^i.jCj 

and let S be the -valued random variable with coefficients Si, ... , Sn ■ 

• (Upper tail bound on Si) For t > 1, we have P(|>S'i| > t) <^ exp(— ci^) + Ca 
for some absolute constant c > 0. 

• (Lower tail bound on S ) For any t < VN, onehasP{\S\ < t) < 0{t/VN)^^/^i + 

3. Overview of argument 

We now give a high-level proof of our main results, Theorem [T51 and Theorem [T^ 
contingent on several technical propositions that we prove in later sections. 

3.1. Preliminary truncation. In the hypotheses of Theorem [TSl and Theorem 
1191 it is assumed that one has the uniform exponential decay property (|14|) on the 
coefRcients Qj on the random matrix Af„. From this and the union bound, we thus 
see that 

sup < log^^^n 

l<i,j<n 
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with overwhelming probabihty. Since events of probabihty less than, say, 0{n^^^'^) 
are negligible for the conclusion of either Theorem [TSl or Theorem \T9\ we may thus 
apply a standard truncation argument (see e.g. [5]) and redefine the atom variables 
Qj on the events where their magnitude exceeds log*^"*"^ n, so that one in fact has 

(27) sup <log''+'n 

l<i,j<n 

almost surely. (This modification may affect the first, second, third, and fourth 
moments on the real and imaginary parts of the by a very small factor (e.g. 
0(n^^")), but one can easily compensate for this by further adjustment of the Qj, 
using the Weyl inequalities pB]) if necessary; we omit the details.) Thus we will 
henceforth assume that (|27|) holds for proving both Theorem [15] and Theorem [191 

Remark 45. If one only assumed some finite number of moment conditions on 
C,ij, rather than the exponential condition p^ . then one could only truncate the 
to be of size n^/*-^" for some constant Cq rather than polylogarithmic in n. 
While several of our arguments extend to this setting, there is a key induction on n 
argument in Section [331 that seems to require \Qj \ to be of size n°^^^ or better, which 
is the main reason why our results are restricted to random variables of exponential 
decay. However, this appears to be a largely technical restriction, and it seems very 
plausible that the results of this paper can be extended to atom distributions that 
are only assumed to have a finite number of moments bounded. 

For technical reasons, it is also convenient to make the qualitative assumption that 
the Qj have an (absolutely) continuous distribution in the complex plane, rather 
than a discrete one. This is so that pathological events such as eigenvalue collision 
will only occur with probability zero and can thus be ignored (though one of course 
still must deal with the event that two eigenvalues have an extremely small but non- 
zero separation). None of our bounds will depend on any quantitative measure of 
how continuous the Qj are, so one can recover the discrete case from the continuous 
one by a standard limiting argument (approximating a discrete distribution by a 
smooth one while holding n fixed, and using the Weyl inequalities (I23|) to justify 
the limiting process); we omit the details. 

3.2. Proof strategy for Theorem I15L For sake of exposition let us restrict at- 
tention to the case fc = 1, thus we wish to show that the expectation EG(Ai(A„)) 
of the random variable G(Ai(^„)) only changes by 0{n~'^°) if one replaces An with 
another random matrix A'^ with moments matching up to fourth order off the diag- 
onal (and up to second order on the diagonal). To further simplify the exposition, 
let us suppose that the coefficients (pq of An (or A'n) are real- valued rather than 
complex- valued . 

At present, A'n differs from An in all components. But suppose we make a 
much milder change to An, namely replacing a single entry y/n^pq of An with its 
counterpart v^Cpg for some I < p < q < n. li p ^ q, one also needs to replace the 

companion entry ^JnC,qp — \fn(,pq with \pnC^qp = y/nC,^^, to maintain the Hermitian 
property. This creates another random matrix An which differs from An in at 
most two entries. Note that An continues to obey Condition CO, and has matching 
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moments with either An or A'^ up to fourth order off the diagonal, and up to second 
order on the diagonal 

Suppose that one could show that EG(Ai(v4„)) differed from EG(Ai(v4„)) by at 
most 71^^^"^" when p ^ q and by at most n^^^'^° when p = q. Then, by applying 
this swapping procedure once for each pair 1 < p < q < n and using the triangle 
inequality, one would obtain the desired bound \EG{Xi{An)) — EG{Xi{A'j^))\ = 
Oin-""). 

Now let us see why we would expect EG(Ai(A„)) to differ from EG(Ai(A„)) by 
such a small amount. For sake of concreteness let us restrict attention to the off- 
diagonal case p q, where we have four matching moments; the diagonal case 
p = q is similar but one only assumes two matching moments, which is ultimately 
responsible for the n^^^'^" error rather than n~'^~'''>. 

Let us freeze (or condition on) all the entries of An except for the pq and qp 
entries. For any complex number z, let A{z) denote the matrix which equals An 
except at the pq, qp, entries, where it equals z and z respectively. (Actually, with 
our hypotheses, we only need to consider real-valued z.) Thus it would suffice to 
show that 

(28) EF(V^Cpg) = EF(VHC;,) + Oin-^-^") 

for all (or at least most) choices of the frozen entries of A„, where F{z) := 
G{Xi{A{z))). Note from that we only care about values of z of size 0{n^^'^+°'^'^'>). 

Suppose we could show the derivative estimates 

(29) 4lFi^) = 0(n-'+o(='')+°(i)) 

for I = 1,2,3,4,5. (If z were complex-valued rather than real, we would need to 
differentiate in the real and imaginary parts of z separately, as F is not holomor- 
phic, but let us ignore this technicality for now.) Then by Taylor's theorem with 
remainder, we would have 

F{z) = F(0) + F'(0)z + . . . + 1f(4)(0)z4 + 0(n-5+0(^o)+o(i)|^|5) 

and so in particular (using pTl) 

FiV^Cp,) = ^^(0) + F'iO)V^Cp, + ■■■ + l^F('\0)V^\^^^ + 0(„-5/2+o(co)+o(i)) 

and similarly for F{^/nC,p^). Since n-5/2+o(co)+o(i) _ 0{n^'^^'^") for n large enough 
and Co small enough, we thus obtain the claim ([^5]) thanks to the hypothesis that 
the first four moments of Cpg and Cpg match. (Note how this argument barely 
fails if only three moments are assumed to match, though it is possible that some 
refinement of this argument might still succeed by exploiting further cancellations 
in the fourth order term ^F'^^\Qi)^/n C,^^.) 

Now we discuss why one would expect an estimate such as ([^ to be plausible. 
For simplicity we first focus attention on the easiest case ^ = 1, thus we now wish 
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to show that F'{z) = O(n"i+°('=o)+°(i)). By (HSJ and the chain rule, it suffices to 
show that 

^k{A{z)) = O(n-i+o(^°)+°(i)). 
dz 

A crude appUcation of the Weyl bound ([23| gives j^Xi{A{z)) = 0(1), which is not 
good enough for what we want (ahhough in the actual proof, we will take advantage 
of a variant of this crude bound to round z off to the nearest multiple of n"^"*^, 
which is useful for technical reasons relating to the union bound). But we can do 
better by recalling the Hadamard first variation formula 

^A,(A(z)) - u,iA{z)rA'{z)u,{A{^)) 

where we recall that Ui{A{z)) is the i*'^ eigenvector of A{z), normalized to be of unit 
magnitude. By construction, A'{z) = e^e* + e^e*, where ei, . . . , e„ are the basis 
vectors of C". So to obtain the claim, one needs to show that the coefficients of 
Ui{A{z)) have size 0(n~^/^"*"°'^-'). This type of delocalization result for eigenvalues 
has recently been established (with overwhelming probability) by Erdos, Schlein, 
and Yau in |17 [ ll8 [ fT9 ] for Wigner Hermitian matrices, assuming some quantitative 
control on the continuous distribution of the (pq. (A similar, but weaker, argument 
was used in [46] with respect to non-Hermitian random matrices; see [46l Section 
4] and [ISl Appendix F].) With some extra care and a new tool fLemma H5|) . we 
are able extend their arguments to cover the current more general setting (see 
Proposition [62l and Corollary [63]), with a slightly simpler proof. Also, z ranges over 
uncountably many possibilities, so one cannot apply the the union bound to each 
instance of z separately; instead, one must perform the rounding trick mentioned 
earlier. 



Now suppose we wish to establish the I = 2 version of (|29]) . Again applying the 
chain rule, we would now seek to establish the bound 

(30) -^K{A{z)) = 0(n-2+o(co)+o(i))^ 

For this, we apply the Hadamard second variation formula 

^A.(A(z)) = -2u,{A{z)rA'{z) {A{z) - X^{A{z))I)-' A'(z)^..(A(z)), 

where TTui{A{z))^ is the orthogonal projection to the orthogonal complement Ui{A{z))-^ 
of Ui{A{z)), and {A{z) — Xi{A{z))I)^^ is the inverse of A{z) — Xi{A{z)) on that or- 
thogonal complement. (This formula is valid as long as the eigenvalues Xj{A{z)) 
are simple, which is almost surely the case due to the hypothesis of continuous 
distribution.) One can expand out the right-hand side in terms of the other (unit- 
normalized) eigenvectors Uj{A{z)), j ^ i as 

(A( ^^_ ,^ \u,{A{z)rA'{z)uM{m' 
^^,A.(A(z))- 2^ XMi^)) - X^iz)) ■ 

By using Erdos-Schlcin-Yau type estimates one expects \uj{A{z))* A' {z)ui{A{z))\ to 
be of size about 0(n^^+°'^^^), while from Theorem [19] we expect \Xj{z) — Xi{z)\ to be 
bounded below by n^'^° with high probability, and so the claim (15(1)) is plausible (one 
still needs to sum over j, of course, but one expects Xj{z) — Ai(z) to grow roughly 
linearly in j and so this should only contribute a logarithmic factor O(logn) = 
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0(n°*^^^) at worst). So we sec for the first time how Theorem [T9l is going to be an 
essential component in the proof of Theorem [151 Similar considerations also apply 
to the third, fourth, and fifth derivatives of Xi{A{z)), though as one might imagine 
the formulae become more complicated. 

There is however a technical difficulty that arises, namely that the lower bound 

\Xj{A{z))-X,{A{z))\>n-'" 

holds with high probability, but not with overwhelming probability (see Definition 
[3] for definitions). Indeed, given that eigenvalue collision is a codimension two 
event for real symmetric matrices and codimension three for Hermitian ones, one 
expects the failure probability to be about n"^'^" in the real case and n~^'^° in the 
complex case (this heuristic is also supported by the gap statistics for GOE and 
GUE). As one needs to take the union bound over many values of z (about or 
so), this presents a significant problem. However, this difficulty can be avoided by 
going back to the start of the argument and replacing the quantity G{Xi{z)) with 
a "regularized" variant which vanishes whenever Xi{z) gets too close to another 
eigenvalue. To do this, it is convenient to introduce the quantity 

- ^ |A,-(A(.))^\.(A(.))P - - 

this quantity is normally of size 0(1), but becomes large precisely when the gap 
between Xi{A{z)) and other eigenvalues becomes small. The strategy is then to re- 
place G{Xi{A{z))) by a truncated variant G{Xi{A{z)), Qi{A{z))) which is supported 
on the region where Qi is not too large (e.g. of size at most n"^"), and apply the 
swapping strategy to the latter quantity instead. (For this, one needs control on 
derivatives of Qi{A{z)) as well as on Xi{A{z)), but it turns out that such bounds 
are available; this smoothness of Qi is one reason why we work with Qi in the first 
place, rather than more obvious alternatives such as infj^i |Aj(A(z)) — Xi{A{z))\.) 
Finally, to remove the truncation at the beginning and end of the iterated swapping 
process, one appeals to Theorem 1191 Notice that this result is now only used twice, 
rather than 0{n^) or 0(n^*"^) times, and so the total error probability remains 
acceptably bounded. 

One way to interpret this truncation trick is that while the "bad event" that Qi 
is large has reasonably large probability (of order about ri~^°), which makes the 
union bound ineffective, the Qi does not change too violently when swapping one 
or more of the entries of the random matrix, and so one is essentially faced with the 
same bad event throughout the 0{n^) different swaps (or throughout the 0{ti^°'^) 
or so different values of z). So the union bound is actually far from the truth in 
this case. 

3.3. High-level proof of Theorem [TSl We now begin the rigorous proof of The- 
orem [TSl breaking it down into simpler propositions which will be proven in subse- 
quent sections. 

The heart of the argument consists of two key propositions. The first proposition 
asserts that one can swap a single coefficient (or more precisely, two coefficients) 
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of a (deterministic) matrix A as long as A obeys a certain "good configuration 
condition" : 

Proposition 46 (Replacement given a good configm'ation) . There exists a positive 
constant Ci such that the following holds. Let k > 1 and ei > 0, and assume n 
sufficiently large depending on these parameters. Let 1 < ii < . . . < ik < n. For 
a complex parameter z, let Ai^z) he a (deterministic) family of n x n Hermitian 
matrices of the form 

A{z) — A{0) + zcpe* + 'zCqCp 

where Cp, Cq are unit vectors. We assume that for every I < j < k and every 
\z\ < whose real and imaginary parts are multiples of n"'-^^ , we have 

• (Eigenvalue separation) For any 1 <i < n with \i — ij\ > n"^^ , we have 

(31) \X,iA{z))-K^{A{z))\>n-^^t-t,\. 

• (Delocalization at ij) If Pi - {A{z)) is the orthogonal projection to the eigenspace 
associated to Xi^{A{z)), then 

(32) m^iAiz))epl\\P,Miz))eq\\ < n-'^'+^K 

• For every a > 

(33) ||P,^,„(v4(z))ep||, ||i',^,„(A(z))e,|| < 2'^/^n-'/^+'\ 

whenever Pi - a is the orthogonal projection to the eigenspaces corresponding 
to eigenvalues \i{A[z)) with 2" < |z — ij\ < 2°'^^. 

We say that A{0), Cp, are a good configuration for ii, . . . ,ik if the above properties 
hold. Assuming this good configuration, then we have 

(34) E(F(C)) = Ei^(C') + 0(n^(''+i)/2+0(ei))^ 
whenever 

F{z) G{K {A{z)), {A{z)). On (^(^)), ■■■.Q^. {A{z))\ 

and 

G = G(\i^ , . . . , Xi^, Qi-^ , . . . , Qi^ ) 
is a smooth function from M.'^ x M'^ — > M that is supported on the region 

Qil 1 ■ ■ ■ ; Q — 

and obeys the derivative bounds 

for all < j < 5, and ^' are random variables with |C|, |^'| < n^^^+^i almost 
surely, which match to order r for some r — 2,3, 4. 

// G obeys the improved derivative bounds 

|V^G| < n-^^'^i 

for < J < 5 and some sufficiently large absolute constant C, then we can 
strengthen n-'^^'+'^'^/^+O^^^^ m ^ to n-(''+i)/2-=i . 
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Remark 47. The need to restrict z to multiples of ■nT^'^ , as opposed to all complex 
z in the disk of radius 7^1/2+^1 ^ ig go that we can verify the hypotheses in the 
next proposition using the union bound (so long as the events involved hold with 
overwhelming probability). For C\ large enough, we will be able to use rounding 
methods to pass from the discrete setting of multiples of n~'-^^ to the continuous 
setting of arbitrary complex numbers in the disk without difficulty. 



We prove this proposition in Section |4l To use this proposition, we of course need 
to have the good configuration property hold often. This leads to the second key 
proposition: 

Proposition 48 (Good configurations occur very frequently). Lei > and 

C\C\,k > 1. Let en < ii < . . . < ik < {1 — s)n, let 1 < p,q < n, let ei, . . . , e„ 
be the standard basis of , and let A{0) — {Cij)i<i-j<n be a random Hermitian 
matrix with independent upper-triangular entries and < n^/^ log*^ n for all 
1 ^ < with C,pq = Qqp — 0, but with Qj having mean zero and variance 1 
for all other ij, and also being distributed continuously in the complex plane. Then 
j4(0); ep, Cq obey the Good Configuration Condition in Theorem \4-6\ for ii, . . . ,ik and 
with the indicated value ofsi,Ci with overwhelming probability. 

We will prove this proposition in Section [S) 

Given these two propositions (and Theorem I19p we can now prove Theorem 1151 
As discussed at the beginning of the section, we may assume that the are 
continuously distributed in the complex plane and obey the bound (j27p . 

Let < e < 1 and A: > 1, and assume cq is sufficiently small and Ci sufficiently 
large. Let M„, M^, djXij, An, A'^^, G, ii, . . . , ife be as in Theorem [151 

We first need 

Lemma 49. For each 1 < j < k, one has Qij{An) < n'^° with high probability. 
Proof. For brevity we omit the An variable. Fix j. Suppose that Qi. > n^°, then 

and so by the pigeonhole principle there exists an integer < to <C log n such that 

which implies that 

|A,^.+2". - « 2t™n-^«/2 

or 

|A,y_2™ -A,J«2t™n-«/2. 
It thus suffices to show that 

P(|A,^+2™ - A,^ I « 2i"n-^"/2) < n-^i 
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uniformly in m (and similarly for A; -2'")i since the logn loss caused by the number 
of m's can easily be absorbed into the right-hand side. 

Fix m. Suppose that |Aij+2™ — A^^. | <C 23™n~'^f/^; then expressing the left-hand 
side as J2^=o ^ Kj+k+i — \j+k and using Markov's inequality we see that 

for ^ 2™ values of fc, and thus 

2™-l 

P(|A.,.+2." - A.J « 2i"n-="/2) « E— ^ I(A.^.+fc+i - K^+k « n-^^/^) 

fe=0 

and hence by linearity of expectation 

2'"-l 

P(|A.,+2™ - A,^ I « 2l™n-^o/2) « — 5] P(A,^.+fc+i - K^+k « 

The claim now follows from Theorem [191 (There is a slight issue when 2™ ^ n, so 
that the index ij-ffc may leave the bulk; but then one works with, say, A^ .+2'"-i ~ A^^. 
instead of A.;^ +2™ — Xi- ) . □ 

Remark 50. One can also use Theorem l60l below to control all terms in the sum 
with |i — ijl ^ log*^ n for some C", leading to a simpler proof of Lemma l49l 

Of course. Lemma l49l also applies with An replaced by A'^- 
Let G : M'' X M'^ ^ M be the function 

fc 

G{\i^ , . . . ,Xi^, Qi-^ , • • • , Qik ) '■= G{Xi^ , . . . , Ai^ ) Y[ viQij) 

where ri{x) is a smooth cutoff to the region x < n'^° which equals 1 on a; < •rf'° jl. 
From (fT5| and the chain rule we see that 

IV^GI < Tf" 

for j = 0, 1, 2, 3, 4, 5. Also, from Lemma l49l we have 

|E(G(A,, (A„), . . . , A,, (A„)))-E(G(A,, (A„), . . . , A,, (A„), Q,, (A„), . . . , Q,, (A„)))l < 

for some c > 0, and similarly with An replaced by AJ^. Thus (by choosing cq small 
enough) to prove (jl6p it will sufhce to show that the quantity 

(35) E(G(A,, {An). . . . , A,, (A„), Q,, (A„), . . . , g,, (A„))) 

only changes by at most n^'^° /2 (say) when one replaces An by A^. 

As discussed in Section [321 it will suffice to show that the quantity (jHSj) changes 
by at most n~'^~'^° /A when one swaps the Qpq entry with l<p<q<nio C,^^ (and 
C,qp with Cgp): and changes by at most n^^^'^°/4 when one swaps a diagonal entry 
C,pp with ^pp. But these claims follow from Propositionl48land Proposition[46l (The 
last part of Proposition [46l is used in the case when one only has three moments 
matching rather than four.) 
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The proof of Theorem [15] is now complete (contingent on Theoremll91 Proposition 
l46l and Proposition HH) . 

3.4. Proof strategy for Theorem 1191 We now informally discuss the proof of 
Theorem [H 

The machinery of Erdos, Schlein, and Yau|17 1 [T8 l ll9j. which is useful in particular 
for controlling the Stieltjes transform of Wigner matrices, will allow us to obtain 
good lower bounds on the spectral gap Xi{An) — Ai_i(A„) in the bulk as soon as 
k ^ log*^ n for a sufficiently large C"; see Theorem [60l for a precise statement. 
The difficulty here is that k is exactly 1. To overcome this difficulty, we will try to 
amplify the value of k by looking at the top left n — 1 x n — 1 minor An-i of An, 
and observing the following "backwards gap propagation" phenomenon: 

If Xi{An) — Xi-k{An) is very small, then Ai(A„_i) — Ai_fc_i(A„_i) will also be 
small with reasonably high probability. 

If one accepts this phenomenon, then by iterating it about log*^ n times one can 
enlarge the spacing k to be of the size large enough so that a Erdos-Schlein-Yau 
type bound can be invoked to obtain a contradiction. (There will be a technical 
difficulty caused by the fact that the failure probability of this phenomenon, when 
suitably quantified, can be as large as l/log'^^^-' n, thus apparently precluding the 
ability to get a polynomially strong bound on the failure rate, but we will address 
this issue later.) 

Note that the converse of this statement follows from the Cauchy interlacing prop- 
erty (1^^ . To explain why this phenomenon is plausible, observe from (IM)) that if 
Ai(A„) — Ai_fc(A„) is small, then Ai(j4„) — Ai_fc_i(A„_i) is also small. On the other 
hand, from Lemma |40] one has the identity 



where X is the rightmost column of An (with the bottom entry v^Cnn removed). 

One expects \uj{An-i)* X\'^ to have size about n on the average (cf. Lemma HS]). 
In particular, if Xi{An) — Xi-k{An-i) is small (e.g. of size 0{n~'^)), then the j = i — l 
term is expected give a large negative contribution (of size ^ n^"'"'^) to the left-hand 
side of ((36| . Meanwhile, the right-hand side is much smaller, of size 0{n) or so on 
the average; so we expect to have the large negative contribution mentioned earlier 
to be counterbalanced by a large positive contribution from some other index. The 
index which is most likely to supply such a large positive contribution is j = i, 
and so one expects Ai(A„_i) — Xi{An) to be small (also of size 0{n~'^), in fact). A 
similar argument also leads one to expect Xi-k{An) — Ai_fc_i(A„) to be small, and 
the claimed phenomenon then follows from the triangle inequality. 

In order to make the above strategy rigorous, there are a number of technical 
difficulties. The first is that the counterbalancing term mentioned above need not 



(36) 
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come from j = i, but could instead come from another value of j, or perhaps a 
"block" of several j put together, and so one may have to replace the gap Ai(A„_i) — 
Ai-fe-i(j4„_i) by a more general type of gap. A second problem is that the gap 
Xi{An-i) — Ai_fe_i(A„_i) is going to be somewhat larger than the gap Ai(A„) — 
Ai_fc(A„), and one is going to be iterating this gap growth about log*^^^-* n times. 
In order to be able to contradict Theorem [60l at the end of the argument, the net 
gap growth should only be 0{n'^) at most for some small c > 0. So one needs a 
reasonable control on the ratio between the gap for An-i and the gap for A„; in 
particular, if one can keep the former gap to be at most (1 + ■^)'^*-'°^ times the 
latter gap, then the net growth in the gap telescopes to (log'^*-^-' n)'^^^°^° which 
is indeed less than 0{n'^) and thus acceptable. To address these issues, we fix a 
base value ng of n, and for any 1 < i — I < i < n < uq, we define the regularized 
gap 



(37) gi,l.n ■= mt ; ; pr ^, „ „ , 

i<i-<i-i<i<i+<n mm{i+ - i-,log'"i no)'°s 



where Ci > 1 is a large constant (depending on C) to be chosen later. (We need to 
cap — i_ off at log'^^ uq to prevent the large values of i+ — i_ from overwhelming 
the infimum, which is not what we want.) 

We will shortly establish a rigorous result that asserts, roughly speaking, that if 
the gap 5i,/,n+i is small, then the gap is also likely to be small, thus giving 

a precise version of the phenomenon mentioned earlier. 

There is one final obstacle, which has to do with the failure probability when 
gi,i,n+i is small but gi_;_|_i.„ is large. If this event could be avoided with overwhelm- 
ing probability (or even a high probability), then one would be done by the union 
bound (note we only need to take the union over 0(log'^''^-' n) different events). 
While many of the events that could lead to failure can indeed be avoided with 
high probability, there is one type of event which does cause a serious problem, 
namely that the inner products Uj(An-i)* X for i- < j < i+ could be unexpect- 
edly small. Talagrand's inequality (Lemma I43p can be used to control this event 
effectively when — i_ is large, but when — i_ is small the probability of failure 
can be as high as 1/ log'^ n for some c > 0. However, one can observe that such high 
failure rates only occur when gi.i+i^n is only slightly larger than gi^i^n+i- Indeed, 
one can show that the probability that gi^i.n+i is much higher than gi^i^i^n, say of 
size 2™(7i_/^„+i or more, is only 0(2~™/^/ log'' n) (for reasonable values of to), and in 
fact (thanks to Talagrand's inequality) the constant c can be increased to be much 
larger when I is large. This is still not quite enough for a union bound to give a total 
failure probability of 0{n~'^), but one can exploit the martingale- type structure of 
the problem (or more precisely, the fact that the column X remains random, with 
independent entries, even after conditioning out all of the block A„_i) to multiply 
the various bad failure probabilities together to end up with the final bound of 
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3.5. High-level proof of Theorem 1191 We now prove Theorem [191 Fix £, cq. 

We write io and no for i,n, thus 

eno < io < (1 - e)"-o 

and the task is to show that |Aio(A„Q) — Xig{Ano-i)\ > "■(7'^° with high probabihty. 
We can of course assume that uq is large compared to all other parameters. We can 
also assume the bound (j27p . and that the distribution of the An is continuous, so 
that events such as repeated eigenvalues occur with probability zero and can thus 
be neglected. 

We let Ci be a large constant to be chosen later. For any I, n with 1 < i ~ I < i < 
n < no, we define the normalized gap (?i,/,n by (|37p . It will suffice to show that 

(38) g^o.l,no < n-'" 

with high probability. As before, we let mi(A„), . . . , u„(yl„) be an orthonormal 
eigenbasis of An associated to the eigenvectors Ai(A„), . . . , A„(A„). We also let 
Xn G C" be the rightmost column of An+i with the bottom coordinate ^yn(^n+i,n+i 
removed. 



The first main tool for this will be the following (deterministic) lemma, proven in 
Section [HI 

Lemma 51 (Backwards propagation of gap). Suppose that no/2 < n < no and 
I < en/ 10 is such that 

(39) gio.l,n+l < 5 

for some < 6 <1 (which can depend on n), and that 

(40) 

for some m > with 

(41) 2™ < 5-1/2. 

Then one of the following statements hold: 



(i) (Macroscopic spectral concentration) There exists 1 < < «+ < n+1 with 
i+-i^ > log'^i/2 n such that |A,+ (A»+i)-A,_ {An+i)\ < S^^^ exp(log°-^^ r^)(^^ 
._). 

(ii) (Small inner products) There exists en/2 < < io — ^ < io < i+ < 
(1 — e/2)n with i+ — i- < log'^^/^ n such that 

(42) E \-Mnrxn\'< ij;-;-^, 

. r^. 2™/"^ log n 

(iii) (Large coefficient) We have 

|Cn+i,«+i|>n°-'. 

(iv) (Large eigenvalue) For some 1 < i < n + 1 one has 

^ nexp(-log° '^^n) 
|Ai(A„+ij| > ^Y72 ■ 
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(v) (Large inner product in bulk) There exists en/lQ < i < (1 — e/10)n such 
that 

\u^{An) Xn\ > ^Y72 ■ 

(vi) (Large row) We have 

n^exp(-log"-^^n) 
^ sTj2 • 

(vii) (Large inner product near ig) There exists en/10 < i < (1 — e/10)n with 
I* ~ *o| ^ log ^ n such that 

|u,(A„)*X„|2>2™/2nlog°-«n. 

Remark 52. In applications 5 will be a small negative power of n. The main 
bad event here is (ii) (and to a lesser extent, (vii)); the other events will have a 
polynomially small probability of occurrence in practice (as a function of n) and 
so can be easily discarded. The events (ii), (vii) are more difficult to discard, since 
their probability is not polynomially small in n, if m is small. On the other hand, 
these probabilities decay exponentially in to, and furthermore are independent in 
a martingale sense, and this will be enough for us to obtain a proper control. 
The exact numerical values of the exponents such as 0.9, 0.95, 0.8, etc. are not 
particularly important, though of course they need to lie between and 1. 



The second key proposition bounds the probability that each of the bad events 
(i)-(vii) occur, proven in Section [T] 

Proposition 53 (Bad events are rare). Suppose that nQ/2 < n < uq and I < en/10, 
and set 6 := n'^'^ for some sufficiently small fixed k > 0. Then: 

(a) The events (i), (Hi), (iv), (v), (vi) in Lemma\5^all fail with high probability. 

(b) There is a constant C such that all the coefficients of the eigenvectors 
Uj{An) for en/2 < j < (1 — e/2)n are of magnitude at most n~^/^ log*^ n 
with overwhelming probability. Conditioning An to be a matrix with this 
property, the events (ii) and (vii) occur with a conditional probability of at 
most 2"'^'" + n-«. 

(c) Furthermore, there is a constant C2 (depending on C",k,Ci) such that if 
I > C2 and An is conditioned as in (b), then (ii) and (vii) in fact occur 
with a conditional probability of at most 2"'^™ log" ^ n + n~'^ . 

Let us assume these two propositions for now and conclude the proof of Theorem 

m 

We may assume cq is small. Set k :— cq/IO. For each uq — log^*^^ riQ < n < ng, let 
En be the event that one of the eigenvectors Uj{An) for en/2 < j < (1 — e/2)n has 
a coefficient of magnitude more than n~^/^log^ , and let Eq be the event that at 
least one of the exceptional events (i), (iii)-(vi), or En hold for some n — log^*^^ no < 
n < no; then by Proposition [53] and the union bound we have 

(43) P{Eo) < n^'^l^. 
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(say). It thus suffices to show that the event 

is avoided with high probabihty. 

To bound this, the first step is to increase the I parameter from 1 to C2, in order 
to use Proposition [SSF c). Set 2™ := np'''^^. From Proposition ISST b) . we see that 
the event that £"„ fails, but (ii) or (vii) holds for n = uq — 1, I ^ I, and some 
occurs with probability 0(2^"'" + tiq'^). Applying Lemma [5T] (noting that 
^-iok ^ conclude that 

P(<?.o,i,"o < no'"" A SS) < P(5.o,2,no-i < r^n^'"' A + 0(2—" + n^'^). 
We can iterate this process C2 times and conclude 

P(5.o,i.no < V'°"Ai?§) < P(g,„^c.+i,«o-c. < 2^^"n-i0«Ai?g)+O(C22— "+C2no-''); 
substituting in the definition of m, we conclude 

P(5.o,l.no < V'"" A < P(5,o,C. + l,no-C. < "0"'" A E^,) + n'/ ^^""^ 

(say). So it will suffice to show that 

P(5.o,c.+i,„o-c. < ng-^'^ A E^) < rC/l^''\ 

By Markov's inequality, it suffices to show that 
(44) ^Z-;Ll\{E-) < nf 

(say), where for each no — log*^^ no < n < no ~ C2, Zn the random variable 
Z„ := max(min(5io,„(,_„+i,„,^),n(7^''). 

Indeed, we have 

P(5.o.c.+i,no-c. < no"'" A SS) < P(Z„„_c. = ^'^ A i?S) 

< n-'^'/'EZ-^^ll{E^,) 

whence the claim. 

We now establish a recursive inequality for EZ,7'"^^I(i?Q). Let no — log*^^ no < 
n < no — C2- Suppose we condition An so that £"„ fails. Then for any to > 0, we 
see from Proposition I53f c) that (ii) or (vii) holds for I = no — n and some 
with (conditional) probability at most 0(2^''™ log"^'^^ n + n^'^). Applying Lemma 
[STl we conclude that 

P(5io,"o-n,"+l < 5^gi„.no-n+l.n > 2"" g,„.n„-n.n+lAE^\An) < 2"'^'" log'^'^^ Tl+n"'". 

Note that this inequality is also vacuously true of An is such that £"„ holds, since 
the event Eq is then empty. 

Observe that if Z„ > 2™Z„+i for some m > then gi„,no~n,n+i < 5f\gi„^na-n+i,n > 

2™<?io,no-n,n+l- ThuS 

F{Zn > 2™Z„+i A E^\An) « 2-'^™ log-'^^ n + n~^ 
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or equivalently 

Since we are conditioning on An, Zn is deterministic. Also, from the definition of 
Zn, this event is vacuous for 2™ > n^", thus we can simphfy the above bound as 



n 



Now we multiply this by 2""^/^ and sum over m > to obtain 

E(Z„7f I(i?§)|A„) < Z-^/'il + log-(2^-i) n) 
(say). Undoing the conditioning on An, we conclude 

E(Z„7f < (l + log-(2'^-i)r.)E(Z-''/2). 

Applying gS]) (and the trivial bound Z^"^^ < n^""/'^) we have 

E(Z„7/^I(£;5)) < (1 + log-(2^-i) n)E{Z--/'liE^,)) + n~'^l' 
(say). Iterating this we conclude that 

(45) E(Z„7_/^^I(i?S)) < 2E(Z;;_/^,^^,, I(£;S)) + n-l' 

(say). 



On the other hand, if holds, then by (i) we have 

^(«o^Liog^i "oj) _ ;^(no-Liog^i «oJ)| ^ „-«exp(log°-95 n)(*+ - *_) 

whenever 1 < i- < * ^ log'^^^^ < * !i *+ 5: ^- From this we have 



5io,LlogCl noJ+l,no-Llog^i noj - ""0 



(say) and hence 
Inserting this into (05]) we obtain (j^J) as required. 



r7 — / 



4. Good configurations have stable spectra 



The purpose of this section is to prove Proposition The first stage is to obtain 
some equations for the derivatives of eigenvalues of Hermitian matrices with respect 
to perturbations. 



4.1. Spectral dynamics of Hermitian matrices. Suppose that \i{A) is a simple 
eigenvalue, which means that Xi{A) ^ Aj(^) for all j ^ i; note that almost all Her- 
mitian matrices have simple eigenvalues. We then define Pi{A) to be the orthogonal 
projection to the one-dimensional eigenspace corresponding to Xi{A); thus, if Ui{A) 
is a unit eigenvector for the eigenvalue Xi{A), then Pi{A) = Ui{A)ui{A)* . We also 
define the resolvent Ri{A) to be the unique Hermitian matrix inverting A — \i[A)I 
on the range of I — Pi{A), and vanishing on the range of Pi{A). If ui{A), . . . , Un{A) 
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form an orthonormal eigenbasis associated to the Xi{A), . . . , A„(A), we can write 
Ri{A) explicitly as 

It is clear that 



(46) R,{A-Xa)^I ^P,. 

We also need the quantity 

By each eigenvalue function A i~> \i{A) for 1 < i < n is continuous. How- 
ever, we will need a quantitative control on the derivatives of this function. The 
first observation is that (and Pi,Ri,Qi) depend smoothly on A whenever that 
eigenvalue is simple (even if other eigenvalues have multiplicity) : 

Lemma 54. Let 1 < i < n, and let Aq be a Hermitian matrix which has a simple 
eigenvalue at Xi^Ao). Then A;, Pi, Ri, and Qi are smooth for A in a neighborhood 
ofAo. 

Proof. By the Weyl inequality Xi{Ao) stays away from the other Xj{Ao) by 

a bounded distance for all Aq in a neighborhood of Aq. In particular, the charac- 
teristic polynomial det(A — XI) has a simple zero at Xi{A) for all such A. Since 
this polynomial also depends smoothly on A, the smoothness of Xi now follows. 
As A — Xi{A)I depends smoothly on A, has a single zero eigenvalue, and has all 
other eigenvalues bounded away from zero, we see that the one-dimensional kernel 
ker(yl — Xi{A)I) also depends smoothly on A near Ao. Since Pi is the orthogonal 
projection to this kernel, the smoothness of Pi now follows. 

Since A — Xi{A)I depends smoothly on A, and has eigenvalues bounded away from 
zero on the range of 1 — (which also smoothly dependent on A), we see that Ri 
(and hence Qi) also depend smoothly on A. □ 

Now we turn to more quantitative estimates on the smoothness of A^ , Pi, Ri, Qi for 
fixed 1 < i < n. For our applications to Proposition I46[ we consider matrices A — 
A(z) which are parameterized smoothly (though not holomorphically, of course) 
by some complex parameter z in a domain C C. We assume that Xi{A{z)) 
is simple for all z G f2, which by the above lemma implies that Xi := Xi{A{z)), 
Pi :— Pi{A{z)), Ri :— Ri{A{z)), and Qi := Qi{A{z)) all depend smoothly on z in 

It will be convenient to introduce some more notation, to deal with the technical 
fact that z is complex-valued rather than real. For any smooth function f{z) (which 
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may be scalar, vector, or matrix- valued) , we use 



V™/(z) ( 



dRe{zydlm{z) 



2-1^1 = 



to denote the m*'' gradient with respect to the real and imaginary parts of z (thus 
V™/ is an (to + l)-tuple, each of whose components is of the same type as /; for 
instance, if / is matrix valued, so are all the components of V™/). If / is matrix- 
valued, we define || V™/||i;' to be the €^ norm of the Frobenius norms of the various 
components of V™/, and similarly for other norms. 

We observe the Leibniz rule 

fe fe-1 
(47) V^/g) = (^"/) * = fi^'f)) + (V^7).9+ E (^"/) * (^'""ff) 

m— m—l 

where the (fc -|- l)-tuple (V"/) * (V'^'^.g) is defined as 

'l\fk-l\ Qk-mj grrig 



E 

max{0,/+m — /i;)<r <inin(^ ,m) 



l'J\m-l'J 9Re(^)''9Im(z)™-'' 9Re(z)'-''9Im(z)'=-™-'+'' / i=o 



The exact coefficients here are not important, and one can view iy^ f) * (V^^'^g) 
simply as a bilinear combination of V™/ and V'^^™g. Note that (|T7)) is valid for 
matrix- valued f,g as well as scalar /, 5. For a tuple (Ai, . . . , A;) of matrices, we 
define 



trace(^i, . . . , Ai) := (trace(Ai), . . . , trace(A;)). 

We can now give the higher order Hadamard variation formulae: 

Proposition 55 (Recursive formula for derivatives of Xi,Pi,Ri). For any integer 
k > I, we have 

fe fe-1 
(48) V'^A, = J2 trace((V™^) * (V'=-™P,)^'0 - E (V"A,) * trace((V'^-"P,)^'0- 

7n—l m—l 

and 
(49) 

V^P, = -P,(V'=A)P, - P^iV''A)R, 

fe-1 

- J2 [^^((V"v4) - (V™A,)/) * (V^-^P^P. + P,(V'=-'"P,) * ((V"A) ~ (V"A,)/)P,: 

m—l 
fe-1 

+ J2 (^'"^0 * (V"-"P.)(/ - 2P,). 

m=l 

fe-1 

(50) {y''R^)P^ = -Y^ {V'^Ri) * (V'-'^P,). 



m=0 
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and 

fe-i 

(51) (V'=i?.)(/ - P^) = -{y''P^)R^ - (^"^') * ((V'-^^) - ( V^"™ A, 

m=0 

and thus 

k-l 

(52) 

- 5](V"i?,)*(V''^-™P.)- 

m=0 

Proof. Our starting point is the identities 

(53) A,P, = AP, 
and 

(54) P,/', = P,. 

We differentiate these identities k times using the Leibniz rule (|T7)) to obtain 

fe-i 

(55) (V^A^P. + ^ (V™A,) * (V'^-^P,) = ^ (V'M) * (V'^-^P,). 

m— m— 

and 

fc-i 

(56) (V'=P.)P, + P.(V^P,) + ^ (V"P,) * (V^-"P,) = (V'^P,). 



Multiplying ([55]) by Pi and taking traces one obtains P5|) (the to = terms cancel 
because of ([531), which implies that trace(^(V'" Pj)Pi) Ai trace((V"Pj)Pj)). 

We next compute V'^Pi using the decomposition 



(57) 

v'^p, = p,(v*'-pOP, + (/-p.)(v'=p,)(/-PO + (/-P^)(v*'-pOP, +p,(v'=p,)(/-PO. 

Multiplying both sides of ([56)1 by Pi (on the right) and using the identity PiPi = Pi, 
we get a cancelation which implies 



fe-i 

(58) P^{^''P^)P^ ^ - (^'"^») * i'^'""' P^)P^■ 

m—1 

Repeating the same trick with I — Pi instead of P^, we have 
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fc-1 

(59) {I - P,){V''P,){I - P,) = J2i^"'P^) * i'^''~"'P^)(I - P^)- 

This gives two of the four components of V'^Pi. To obtain the other components, 
muhiply (I55t on the left by / — i-^ and notice that the (/ — Pi){W'^Xi)Pi term 
vanishes because of ((54)) . Rearranging the terms, we obtain 

fe-i 

(/-P,)(^-A,)(V^P.) = - 5](/-P.)((V'"A)-(V"A,)/)*(V'=-"P0-U-^.)(V'=^)P. 

m— 1 

Applying Ri on the left and Pi on the right and using (|46|) . we get 

fe-i 

(/ - P^){S/''P,)P, = - ^ i?.((V"A) - (V^A,)/) * (V'=-'"P,)^. - i?.(V'=A)P,. 

7n— 1 

By taking adjoints, we obtain 
fc-i 

P,{W''Pi){I - P,) = - 5Z ^^(V'-"P.) * ((V"A) - (V"A,)/)P. - P.(V'=^)i?,. 

m—1 

These, together with ^ and ([57]) imply (|49| . 

Now we turn to Ri. Here, we use the identities ipS)) 

A,/) = /-P, 

and 

A.P. = 0. 

Differentiating the second identity k times gives (|50p . Differentiating the first iden- 
tity A: times, meanwhile, gives 

k-l 

(V'=i?.)(^ - XJ) = -(V-^P,) - J2 (^"^0 * ((V*'-"^) - (V'=-"A,)/); 
multiplying on the right by Ri, we obtain (|?T|) . and then ([5^ follows. □ 

We isolate the /c = 1 case of Proposition 1551 obtaining the Hadamard variation 
formulae 

(60) VA,: = trace(V^P,) 
and 

(61) VP, = -P,(VA)P, - P,{VA)R, 
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4.2. Bounding the derivatives. We now use the recursive inequaUties obtained 
in the previous section to bound the derivatives of Xi and Pi, assuming some quan- 
titative control on the spectral gap between A; and other eigenvalues, and on the 
matrix A and its derivatives. Let us begin with a crude bound. 

Lemma 56 (Crude bound). Let A — A(z) be annxn matrix varying (real) -linearly 
in z (tfius V'^A — for k >2), with 

\\yA\\op < V 

for some y > 0. Let 1 < i < n. At some fixed value of z, suppose we have the 
spectral gap condition 

(62) |A,(A(z))-A,(A(z))|>r 

for all j 7^ i and some r > (in particular, Xi{A{z)) is a simple eigenvalue). Then 
for all k > 1 we have ( at this fixed choice of z) 

(63) |V'=A,|«fcFV-'= 
and 

(64) |!V'=P,|lop<fcT^'r^' 
and 

(65) ||V'=i?,|Up«fc 
and 

(66) |V^-g,| «fc 

Proof. Observe that the spectral gap condition (|62|) ensures that 

(67) \\R,\\op < ^■ 
We also observe the easy inequality 

(68) |trace(BPi)| = |trace(P^B)| = | trace(PiBP,)| < \\B\\op 

for any Hermitian matrix B, which follows as Pi is a rank one orthogonal projection. 



To prove ((63|) . (|64)) . we induct on k. The case k = 1 follows from (l60l) . (|6T|) 
and (|67l) : and then, for fc > 1, the claim follows from the induction hypotheses and 

(gHl), dMl), (EZD, (EHl). 

To prove ([55]) . we also induct on k. The case k ^ follows from (|57)) . For fc > 1, 
the claim then follows from the induction hypotheses and ([52|) . 



To prove ([66]), we use the product rule to bound 

fe 

IV^^-Q.I «fc I trace((V'" i?0 * (V^-^i?,))! 

m—O 
fe 

and the claim follows from (1551). □ 
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This crude bound is insufficient for our applications, and we will need to sup- 
plement it with one that strengthens the spectral condition, and also assumes an 
"delocalization" property for the projections Pa,Pp relative to the perturbation A. 

Lemma 57 (Better bound). Let A = A(z) be an nx n matrix varying real-linearly 
in z. Let 1 < i < n. At some fixed value of z, suppose that Xi = Xi{A{z)) is a 
simple eigenvalue, and that we have a partition 



I = P^ + Y.Pa 



where J is a finite index set (not containing i), and Pa are orthogonal projections 
to invariant spaces on A (i.e. to spans of eigenvectors not corresponding to Xi). 
Suppose that on the range of each Pa , the eigenvalues of A — Xi have magnitude at 
least ra for some Tq, > 0,' equivalently, we have 

(69) \\R^Pa\\op<—■ 

Ta 

Suppose also that we have the delocalization bounds 

(70) \\PaAPp\\F <VCaCp 

for all a, l3 Cz J and some v > and Ca > 1 with Ci — I, and the strong spectral gap 
condition 

(71) E - ^ ^ 

1"a 

for some L > 0. Then at this fixed choice of z, and for all a, (3 ^ J , we have 

(72) |V'=A,|«feL'^-ii;'= 

(73) ||/'.(V'=PO^*IIf «fe 

(74) ||Po(V'=P,)P,||f. - ||P.(V'=PO^a||F «fe -L^~\^ 

Ta 

(75) \\Pa{V^P{)Pp\\F «fc ^i'^-V 

TaTp 

for all k > 1, and 

(76) ||P,(V'=P,OP.||F«fci'+'«' 

(77) \\Pa{V''R^)P^\\F = WPiV" R^)Pa\\F «fe —L^v'' 

(78) WPaiV'R^PpWF «fe ^L'^-'v" 



rarp 



for all fc > 0. 



We remark that we can unify the bounds ([75)) - ((75|) and ([7S)) - ([75)) by allowing a, (3 
to vary in J U {i} rather than J, and adopting the convention that :— 1/L. 



Proof. Note that the projections P,; and the Pa are idempotent and all annihilate 
each other, and commute with A and Ri. We will use these facts throughout this 
proof without further comment. 
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To prove ((72 |) - ((75|) . we again induct on k. When fc = 1, the claim (|72p follows 
from deO]), dlQ]), and dSHl), while follow from §^ and (For ^ we 

in fact obtain that the left-hand side is zero.) 

Now suppose inductively that fc > 1 , and that the claims have already been proven 
for all smaller values of k. 

We first prove (|72p . From (05]) and the linear nature of A, we have 

|V^-A.| <fc |trace((VA) * iV''-^P^)P^)\ + ^ |V'"A.|| trace((V'=-™P,)^OI- 

m— 1 

From and the inductive hypothesis ([75]) we have 

|trace((V'=-"P,)^'OI <fe ^'^-"'t;^-™ 
for any 1 < m < fc — 1, and thus by the inductive hypothesis (17^ we see that 

k-l 

|V™A.||trace((V'=-'"PO^OI «fc L'^'^v'^. 

m— 1 

Next, by splitting (VA) * {W^-^P,)P, as EaeJu{»}C^^) and using 

(1551) . we have 

|trace((V^)*(V'=-ipO^.)l <fc I] ||P,;(V^)P„||f ||P„(V'=-ip,)P,;||f . 

as JU{i} 

Using ([70]) and the inductive hypotheses ((73)) . (|74|) . we thus have 

I trace((VA) * {V''-^ P^)P^)\ <fe vL^-^v''-^ + V vCc^ — L'^-'^v''-^. 

Applying (ffTj) we conclude ([72]) as desired. 
Now we prove ([73]). From (HH) (or ([58])) we have 

k-l 

mi^^PdnWr «fe 5] ||P,(V'"P,) * (V'=-"P,)P,||f. 

We can split 

||P,(V'"P,) * (V'^-^POPJf «fe ^ ||P.(V™POPa||F||Pa(V'=-"P.)P,||F. 

as JU{i} 

Applying the inductive hypotheses ([73]). (|74l) . we conclude 

||P,;(V'=P,)P,;||f <fc + ^ S^^v^-^w™— 

m=l aSJ 

bounding one of the ^ factors crudely by L and then summing using (|7ip we obtain 
(|73| as desired. 
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Now we prove ([15]). From (gll) (or ([59])) we have 



m—1 
k-1 



«fc E \\Pc.iy"'POPy\\F\\p,{y''-"'POPp\\F 

m=l 7g JU{i} 



Using the inductive hypotheses (|74|). ([75|). we obtain 

fe-i 



Bounding one of the — factors crudely by L and applying ([7T|) we obtain ((75l) as 
desired. 

Finally, we prove ([74]). Since Pa{V''Pi)Pi has the same Frobenius norm as its 
adjoint P,(V'=Pi)P„, it suffices to bound \\Pc{V^Pi)Pr\\F- From ^ we have 

fc-i 

|1P„(V'=P,)P.||f «fe |li?.P„(VA)*(V'=-ip,)P.|lF+^ |V'"A,||li?,P„(V'=-'"P,)P.|lF 

m— 1 

and hence by ([69]) 

II P„(V'=P,)P. II F «fe -||Po(vA)(v'=-lpOP^||F+- |V'"A,|||P„(V'=-™P,)P,||f. 

m—1 

From the inductive hypotheses ff72)) . ([74l) and crudely bounding ^ by L, we have 

A;-l 

- Y |V™A,|||P„(V'=-'"P,)P,;||f «fe —L'-W 



m—1 



Meanwhile, by splitting 

||P„(VA)*(v'''-ip)P||F< Y II^«(V^)P;3||f||P/3(V^~'P)P||f 

/3eJu{4} 



and using (|70p and the inductive hypothesis ([741) we have 

— |lP„(Vyl) * (V^-ip)PllF «fc —{vcaL'^-^'''' + Y vc^cp^L'^-^v'^-'). 



Applying ([7T|) we obtain (j74p as required. 



Having proven (|72l) - ([75]) for all fc > 1, we now prove ([76|) - ([78]) by induction. The 
claim is easily verified for k = (note that the left-hand side of ([76| and ([77)) in 
fact vanishes, as does the left-hand side of ([78]) unless a — (3), so suppose fc > 1 
and that ([76|) -([78 | has been proven for smaller values of fc. 
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We first prove (l76|) . From ([50|) we have 

m— 

m=0 qG Ju{i} 

Applying ([73)) . (|74|) and the induction hypotheses (|76)) . (|77)) we conclude 

||P^(V'^'i?i)P»||F <fe Y L"+1v"L'=-™w'=~'" + ^ £^^m^m£a^fc_,„_i^fc_™. 

crudely bounding one of the ^ factors by L and using (ffTI) we obtain the claim. 

Similarly, to prove ((77)) . we apply (1501) as before to obtain 

fe-i 

||P«(V'=i?,OPjF «fc ^ ||Pa(V"i?0 * (v'=-™p,)p,||f 

rn— 

«fc E E II^«(V"P0^/3||f||p^*(v'^-"P0p.|1f; 

m=0/3GJu{i} 

applying (|73)) . (l74l) and the induction hypotheses (l77l) . (|78| we conclude 

fc-i 

Again, bounding one of the factors by L and using ([7T|) we obtain the claim. 



Finally, we prove (1751) . From ([FT]) we have 



fc-i 



\Po.iy''Ri)Pp\\F<:k \\Po.iy''Pi)R^Pp\\F^\Po.(y''-^R^)*{'^A)R,Pfi\\F+Y. |V'^-'"A,|||P„(V'"P,)i?,P^||F. 



771—0 



As RiPfj — PpRiPp has a norm of at most we conclude 



fe-i 



1 1 1 ''■"^ 

\Po.{^''R^)Pp\\F «fc —\\Pc.{^''Pi)Pp\\F+—\\Pa{^''-^R^H^A)Pp\\F + — ^ | V'^"™ A, 1 1 1 P„ ( V™P,)^/J 1 1 F ■ 



777 — 



From ([7^. the crude bound t- < and the induction hypothesis ([75)) we have 



fe-i 



- V |V^-™A,|||P„(V™i?,)P/j||F «fc ^P'=-ii;'=. 
From ([75]) and ^ < i we similarly have 

1||p„(v'=p.)p^||f«. ^l'^-V. 

Finally, splitting 

||P„(V'=-lp,)*(VA)P^||f < ^ ||P„(V'=-lp,)P^||F||P-y(VA)P^||f 

75 JU{i} 
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and using the induction hypothesis (|77|) . ([78| and ([70|) . we obtain 

Vet 

and thus by ([7T|) 



Putting all this together we obtain (|75|) as claimed. □ 



We extract a special case of the above lemma, in which the perturbation only 
affects a single entry (and its transpose): 

Corollary 58 (Better bound, special case). Let A = A{z) be an n x n matrix 
depending on a complex parameter z of the form 

A{z) ~ A{0) + zCpCq + ze*ep 

for some vectors ep,eq. Let I < i < n. At some fixed value of z, suppose that 
Ai = Xi{A{z)) is a simple eigenvalue, and that we have a partition 

i = p, + J2Po. 

where J is a finite index set, and Pa are orthogonal projections to invariant spaces 
on A (i.e. to spans of eigenvectors not corresponding to Xi). Suppose that on the 
range of each Pa, the eigenvalues of A — Xi have magnitude at least ra for some 
Ta > 0. Suppose also that we have the incompressibility bounds 

\\Paep\\,\\Paeq\\ < wdU^ 

for all a ^ J U {i} and some w > and da > 1- Then at this value of z, and for 
all k > 1, we have 

(79) |v'=A,|«fc(5^^)'=-W^ 
and 

(80) |V^-Q,| «fe (^ 
for all k > at this value of z. 



^" -fc+2^2fe 

r, 



Proof. A short computation shows that the hypotheses of Lemma [57] are obeyed 
with V replaced by O^w"^), Ca set equal to dH'^ , and L equal to 0{J2a<£j 7^)- Fi'o™ 
([7^ we then conclude ((7^ . As for ([SO)) , we see from the product rule that 

k 

IV^Q,;! «fc |trace((V'"i?,)(V'-'"i?n)l 

m— 

which we can split further using Cauchy-Schwarz as 

A; 

|V'=g,|«fc^ \\Pa{y"'R^)PdF\\Pa{y''-'^R^)Pp\\F. 
m=0 a.^e JU{i} 
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Applying we conclude 

Bounding; one of the factors of — in the first sum by i, and — — in the second 
sum by , and using (j7ip and the choices for v and L, we obtain the claim. □ 

4.3. Conclusion of the argument. We can now prove PropositionSS] Fix fc > 1, 

r = 2,3,4 and ei > 0, and suppose that Ci is sufficiently large. We assume 
A{0),ep, Bq, ii, . . . , ik,G, F, (' are as in the proposition. 

We may of course assume that F{zq) =^ for at least one zq with |zo| < n^^-^'^'^^, 
since the claim is vacuous otherwise. 

Suppose we can show that 

(81) V™F(z) = 0(n-™+°(^i)) 

for all \z\ < v}!'^^'^^ and < m < 5. Then by Taylor expansion, one has 
F(C) = P(C, C) + 0(n-('-+i)/2+o(ei)) 

where P is a polynomial of degree at most r whose coefficients are of size at most 
^O(ei)^ Taking expectations for both F(C) and F{(^'), we obtain the claim ([M)) 
when CiC- A similar argument gives the improved version of p4p at the end of 
Proposition |46] if one can improve the right-hand side of (jSTj) to 0{n^"^~^ ™'^^) for 
some sufficiently large absolute constant C^. 

It remains to show (|8ip . By up to five applications of the chain rule, the above 
claims follow from 

Lemma 59. Suppose that F{zq) ^ for at least one zq with \zq\ < n^/^+^i . Then 
for all z with \zo\ < n^^'^'^'^'- , and all 1 < j < k, we have 

|V*''A,^.(z)|«fcn5-'i''Vi-^- 

and 

|V'=Q,^(z)|«fc 
for all z with \z\ < nVa+ei and allO <k <10. 

Proof. Fix j. Since F{zo) ^ 0, we have 

(82) Q^,iA{zo)) <n'K 

For a technical reason having to do with a subsequent iteration argument, we will 
replace ([5^ with the slightly weaker bound 

(83) Q^,{A{zo))<2n'K 



By the definition of Qi, we have, as a consequence, that 
(84) \XAM^o)) - A.,(A(zo))| » 
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for all i' 7^ ij. 
By the Weyl inequalities ([53]), we thus have 



\XAA{z))-K^{A{z))\:^n-'^/^ 
whenever \z — zq\ <C n^^^'^^'^ . From Lemma [56l we conclude that 

(85) |V"A,^.(A(z))| «„ n^if^+D/^ 
and 

(86) |V'"g,^.(A(z))| «„ n^i(™+2)/2n 

for all m > 1, whenever |z — zq\ <C n^^^^'^^. In particular, from (|83p and the 
fundamental theorem of calculus we have 

(87) Q,^.(A(z)) 
for all such z. 

Note that by setting Ci sufficiently large, we can find z such that |z — zqI ^ 
Ti^^^^'^i, |z| < n^/^+'^i, and that the real and imaginary parts of z are integer 
multiples of n~'^^ . Then (|32p . ((33)) holds for this value of 2. Applying Corollarv[58l 
we conclude that 

|V'=A,^.(z)|«fen2-i'=n-^( V —)'^~' 

0<a<log n 

and 

|V^-Q,^.(z)|«,n2^^'=n-'=( V -)'=+2 
for all fc > 1, where is the minimal value of |Ai — Xi^ \ for |i — Jj| > 2". Note that 



Meanwhile, from (1571) we have 

Ai — Aj,- p 



^1 \Xi — Xi- 

0<O!<log n i^ij ^ 



From Cauchy-Schwarz this implies that 

E TT-^ « 

^ Ai-Aj. 

while from pip we have (with room to spare) 

E n .\. « ^''^ 



and thus 



and so 



E ^«""^ 

0<a<log n 



|V'^'A,^.(z)|«fcn5^i^n-'= 
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and 




and 



(88) |V'''Q,^(z)|«fcn^^i("+2)n-'= 



for all z with \z - zo| < and < fc < 10. 

This establishes the lemma in a ball i?(zo, "-^^ '^'^0 of radius rr'^~'^^^ centered at 
zq. To extend the result to the remainder of the region {z : \z\ < n^/^+^i}, we 
observe from (|88|) that Qi- varies by at most 0{n^^'^) (say) on this ball (instead 
of 1.9 we can write any constant less than 2, given that ei is sufficiently small). 
Because of the gap between ([82|) and ([83]), we now see that (|83l) continues to hold 
for all other points zi in B{zo,n~^~^^^) with \zi\ < n^/^+^i. Repeating the above 
arguments with zq replaced by zi, and continuing this process, we can eventually 
cover the entire ball {z : \z\ < n^/^^^^} by these estimates. The key point here is 
that at every point of the process holds, since the length of the process is only 
j^3/2+3ei Tf^]^[\Q each step the value of Qi^ changes by at most 0{n^^'^^). □ 

The proof of Proposition 251 is now complete. 



The purpose of this section is to prove Proposition 1481 The arguments here are 
largely based on those in [171 [H IE] ■ 

5.1. Reduction to a concentration bound for the empirical spectral dis- 
tribution. We will first reduce matters to the following concentration estimate for 
the empirical spectral distribution: 

Theorem 60 (Concentration for ESD). For any e,S > and any random Herniit- 
ian matrix Af„ = iCij)i<i.j<n whose upper-triangular entries are independent with 
mean zero and variance \, and such that \C,ij\ < K almost surely for all i,j and 
some 1<K< n^/'^-" , and any interval I in [-2 + £,2-e] of width \I\ > " , 

the number of eigenvalues Nj ofWn ■— -j=Mn in I obeys the concentration estimate 



with overwhelming probability. In particular, Nj — Q^{n\I\) with overwhelming 
probability. 

Remark 61. Similar results were established in [T71 [T51 [12] assuming stronger 
regularity hypotheses on the Qj . The proof of this result follows their approach, but 
also uses Lemma[43land few other ideas which make the current more general setting 
possible. In our applications we will take K — log*^^^-* n, though Theorem l60l also 



5. Good configurations occur frequently 



\Ni - n / psc{x) dx\ < 5n\I\ 
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has non-trivial content for larger values of K. The loss of log^° n can certainly 
be improved, though for our applications any bound which is polylogarithmic for 
K = log°(^' n will suffice. 

Let us assume Theorem [60] for the moment. Wc can then conclude a useful bound 
on eigenvectors (which will also be applied to prove Theorem I19|) : 

Proposition 62 (Delocalization of eigenvectors). Let e, M„, W„, Cijj be as in 
Theorem \6(A Then for any I < i < n with Xi{Wn) G [— 2-|-e, 2— e], ifui{Wn) denotes 
a unit eigenvector corresponding to Xi(Wn), then with overwhelming probability each 

coordinate of Ui{Mn) is Oe{ — °i% " )■ 

Proof. By symmetry and the union bound, it suffices to establish this for the first 
coordinate of Ui{Wn)- By Lemma HTl it suffices to establish a lower bound 

n— 1 

Y.(MWn-l) - A,(Ty„))-2|l,,(M/„_i)*^X|2 

^ \Jn log n 

with overwhelming probability, where Wn-i is the bottom right n — 1 x n — 1 minor 
of Wn and X e C"~^ has entries Cii for i = 2, . . . , n. But by Theorem 15(11 we can 
(with overwhelming probability) find a set J C {1, . . . , n— 1} with \ J\ log^*^ n 

such that \\j{Wn-i) - \^{Wn)\ <e " for all n e J. Thus it will suffice to 

show that 

^|u,(W„_i)*X|2», |J| 

with overwhelming probability. The left-hand side can be written as ||7r//X||^, 
where H is the span of all the eigenvectors associated to J. The claim now follows 
from Lemma 1321 D 



We also have the following minor variant: 

Corollary 63. The conclusions of Theorem l 60\ and Provosition \6S\ continues to hold 
if one replaces a single diagonal entry C^pp of M„ by a deterministic real number 
X — 0{K), or if one replaces a single off-diagonal entry Qpq of Mn by a deterministic 
complex number z = 0{K) (and also replaces with 'z). 

Proof. After the indicated replacement, the new matrix Af^ differs from the original 
matrix by a Hermitian matrix of rank at most 2. The modification of Theorem 1601 
then follows from Theorem [60l and Lemma [39l The modification of Proposition [62l 
then follows by repeating the proof. (One of the coefficients of X might now be de- 
terministic rather than random, but it is easy to see that this does not significantly 
impact Lemma 1321) D 

Now we can prove Proposition l48l Let e, ei, C, Ci, k,ii, . . . , ik,P, q, A{0) be as in 
that proposition. By the union bound we may fix f < j < /c, and also fix the \z\ < 
^1/2+61 -^pj-^ose real and imaginary parts are multiples of n~^^ . By the union bound 
again and Corollary [531 (with K = log*" n), the eigenvalue separation condition ([21]) 
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holds with overwhelming probability for every 1 < i < n with |j — j| > n"^^ , as 
does ([32]) (note that (A(z))ep|| is the magnitude of the p*^ coordinate of a miit 
eigenvector {A{z)) of A(z)). A similar argument using Pythagoras' theorem gives 
(|33p with overwhelming probability, unless the eigenvalues Xi{A{z)) contributing to 
(|33)) are not contained in the bulk region [(—2 + e')n, (2 — e')n\ for some e' > 
independent of n. However, it is known (see [25 ; one can also deduce this fact 
from Theorem I60|) that Xi{A{z)) will fall in this bulk region with overwhelming 
probability whenever < i < (1 — |)n (say), if e' is small enough depending on 
s. Thus, with overwhelming probability, a contribution outside the bulk region can 
only occur if 2" n, in which case the claim follows by estimating ||Pi ._c((A(z))ep|j 
crudely by |jep|| = 1, and similarly for ||Pi._a(^(2))eq||. This concludes the proof of 
Proposition [48] assuming Theorem [60l 



5.2. Spectral concentration. It remains to prove Theorem [60l 
Following [ITl [181 HI] ) we consider the Stieltjes transform 

1 " 1 

of Wn , together with its semicircular counterpart 

s{z) / Psc{x) dx 



(which was computed explicitly in (jl07p ). We will primarily be interested in the 
imaginary part 

1 " 

of the Stieltjes transform in the upper half-plane ry > 0. 

It is well known that the convergence of the empirical spectral distribution of Wn 
to Psc{x) is closely tied to the convergence of s„ to s (see [5], for example). In 
particular, we have the following precise connection (cf. [TT] Corollary 4.2]), whose 
proof is deferred to Appendix [Cl 

Lemma 64 (Control of Stieltjes transform implies control on BSD). Let 1/10 > 
V ^ 1/*^; o,nd L,e,5 > 0. Suppose that one has the hound 

(90) \sn{z) - s{z)\ < 5 

with (uniformly) overwhelming probability for all z with |Re(z)| < L andlvii{z) > rj. 
Then for any interval I in [—L + e, L — e] with \I\ > max(2r7, ^ log j), one has 



I 



\Ni -n / psc{x) dx\ <e 5n\I\ 
with overwhelming probability. 
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In view of this lemma, it sufBccs to show that for each complex number z with 
Re(z) < 2 - e/2 and Im(z) > ?/ := one has 

\Sn{z)-s{z)\<0(X) 

with (uniformly) overwhelming probability. 
Fix z as above. From (jl07p . s{z) is the unique solution to the equation 

(91) s{z) + —^ = 

with Ims(z) > 0. The strategy is then to obtain a similar equation for s„(z) (note 
that one automatically has Im(s„(z)) > 0). 

By Lemma |42] we may write 

1 " 1 

(92) ,„(,)^_^__ 
where 



n ^ -i=Chh — z — Yh 
k=i \Aj^'"' 



Yk := al{Wn,k - ziy^ak, 
Wn,k is the matrix Wn with the /c*'* row and column removed, and is the k*-^ 
row of Wn with the fc*'' element removed. 

The entries of at are independent of each other and of Wn,k, and have mean zero 
and variance By linearity of expectation we thus have, on conditioning on Wn^k 

E{Yk\Wn,k) = - trace(VK„,fc - ziy^ = (1 - -)s„,fc(z) 
n n 

where 



1 1 



^ kiWn,k) - Z 

is the Stieltjes transform of Wn,k- From the Cauchy interlacing law (p4)) we have 

Sn{z) - (1 - -)s„.fc(z) -of-/ I ^ 12 dx] = O f — 

and thus 

(93) ^{Yk\Wn.k) ^ sn{z) + O ' ^ 



if2 log^y n 

We now claim that a similar estimate holds for Yk itself: 

Proposition 65 (Concentration of Yk). For each 1 < k < n, one has Yk 
Sn{z) + 0{j^^) with overwhelming probability. 

Assume this proposition for the moment. By hypothesis, -^Ckk < K/y/n < n~ 
almost surely. Inserting these bounds into (|92|) . we see that 

1 ^ 1 

s„(z) + - 



^s„(z) + z + o(l) 
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with overwhelming probability (compare with (|91|) ). This implies that with over- 
whelming probability either s„(z) = s(z) + o(l) or that Sn{z) — ~z + o(l). On the 
other hand, as Ims„(z) is necessarily positive, the second possibility can only occur 
when Imz = o(l). A continuity argument (as in 1181 ) then shows that the second 
possibility cannot occur at all (note that s(z) stays a fixed distance away from —z 
for z in a compact set) and the claim follows. 



5.3. A preliminary concentration bound. It remains to prove Proposition l65] 
We begin with a preliminary bound (cf. [19, Theorem 5.1]): 

Proposition 66. For all I CR with \I\ > ^'''"g"" , one has 

Ni < n\I\ 

with overwhelming probability. 



The proof, which follows the arguments from H^, but using Lemma |43] to simplify 
things somewhat, is presented in Appendix [Cl 

Now we prove Proposition [55] Fix fc, and write z = x + ^/~^r|. From (|93l) it 
suffices to show that 



log n J 

with overwhelming probability. Decomposing as in (|114p , it thus suffices to show 
that 

(94) y ^ ^ = o' 



-f^ Aj(W^„,fe) - (a; + V^?7) , 
with overwhelming probability, where Rj :— |uj(M/^n,/c)*afeP — Xjn. 

Let 1 < < i+ < ?^, then 

where H is the space spanned by the UjiyVn^k)* for i_ < j < i+. From Lemma [ 
and the union bound, we conclude that with overwhelming probability 



(95) I E R^\<<^IIIE]S}^^^^1J^^}^, 

By the triangle inequality, this implies that 

lie. 112//*+-*- , \/«+ -i-KXogn + K'^ Xog n 

> \\PHak\\ < \ 

^-^ n n 

and hence by a further application of the triangle inequality 

(96) E 

i-<j<i+ 
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with overwhelming probability. 

Since rj > K'^ log^^ n/n, the bound (together with Proposition [55)) already 
lets one dispose of the contribution to (|94)) where \Xj{Wn,k) — x\ < \og^^ n/n. 
For the remaining contributions, we subdivide into 0(log'^ n) intervals {j : i_ < 
j < i+} such that in each interval a < \Xj{Wn,k) — x\ < (1 + i^^2^ )o, for some 

a > K^\og^^n/n (the value of a varies from interval to interval). For each such 

has magnitude 0{^) and fluctuates by at 



interval, the function . — r — — , 
most O( niog:i „ ) as j ranges over the interval. From (j95|) . (|96l) we conclude 



E 



\j{Wn.k)-{x + V^V) 



< — ( - i^K log n + ii"^ log^ n) 



an 



an log n 



with overwhelming probability. By Proposition l661 — i_ ^ an with overwhelming 
probability. Thus we have 



E 

i- <3<i- 



if log n 



1 



log'^ n 



with overwhelming probability. Summing over the values of a (taking into account 
the lower bound a) we obtain (|M)) as desired. 



6. Propagation of narrow spectral gaps 



We now prove Lemma [STl Fix io,l,n. Assume for contradiction that all of the 
conclusions fail. We will always assume that uq (and hence n) is sufRciently large. 



By (|37l) . we can find 1 < < ig — I < < i+ < n + 1 such that 

Xi^iAn+i) - Ai_(yl„+i) = 5i(,_;,„+i min(i+ -«_,log'^' no)'°*^'"' 

If i+ — i- > log'^^^^n, then conclusion (i) holds (for n large enough), so we may 
assume that 

(97) i+ - i_ < log'^i/^n. 
We set 

(98) L := V(v4„+i) ~ \_iAn+i) = g^o,l,n+li^+ - 1-)'°^°'^. 
In particular (by ([5^ . (^7])) we have 

(99) L < 5exp(log"-^%). 

We now study the eigenvalue equation (j25p for z = z_, which we rearrange as 



E 



|u,(yl„)*^„|' 



E 



|uj(A„)*X„ 



+ an+l,n+l — Ai_ (yl„+i). 
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Observe that 



Since conclusion (ii) fails, we have 



A,(A„)-A,_(yl„+i) - 2W2LlogO-oin' 



On the other hand, since conclusions (iii), (iv) fail, we have 



rt exp(— log"'^^ n) — i_) 



|a„+i.n+i - Ai_(A„+i)| < < /''^ixioj-"-°i 

angle 



ji/2 - 2'"/2+iiiog" 

thanks to the bounds i+ — > 1, (pij) and (|99p . By the triangle inequality, we 
thus have 



E 



A»_(A„+i)-Aj(A„) - 2W2+1LI0J 



,0.01 , 



n 



Note that all the summands on the left-hand side are non-negative. By a dyadic 
partition and the pigeonhole principle (using the convergence of the series 1/^^), we 
can thus find fc > 1 such that 

(.r.r.. sr \uj{An)*Xn\^ n(i+-i^) 

^ ' .,,,A,_(A„+i)-A,(A„)>>2W2Lfc2iog00i„- 

In particular 2*^^^ < i^. 

Let's first suppose that 2^^^ > log'^^^^n. Then by the failure of conclusion (i), 
we have 

A._ (A„+i) - A,(A„) > 51/4 exp(log°-95 n)2'=-i 
for all j in the summation in (jlOOp , and thus (by (|4T1) , (j99| and the trivial bounds 

?■+ — j- > 1 and k = O(logn)) 

(101) 

l<j"<i_ :2'=-i<i_-j<2'= *= 

On the other hand, from the failure of conclusion (v), we have 
(102) |.,(A„)*X„P < -exp(-log°--^n) 

for en/10 < J < (1 — e/10)n. This already contradicts (|101l) when the range of 
summation in (|10ip is contained in the bulk region en/10 < j < (1 — e/10)n. The 
only remaining case is when (jlOip approaches the edge, which only occurs when 
2^ ^ en. But in this case we note from Pythagoras' theorem and the failure of 
conclusion (vi) that 

n 9 / 1 96 \ 
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leading again to a contradiction with ()101|) . We may therefore assume that 2*^ ^ < 
log'^i/^ n, thus k = 0(Ci loglogn). 

By the failure of conclusion (vii), we now have |mj(A„)*X„|2 < 2'"/2n log"'^ n for 
all i in the summation in (|100p ; we conclude that 

r,.<._. KMnJ)-HAn) 2™Llog°-n- 
If we set i := i- — 2'^^^, we conclude that < i- — i < log'^^^^ n and 



A._(A„+i) - A,__(A„) < 2'"- — Llog" «^n. 

— 2_ 



An analogous argument, starting with i — ij^'vci ([25]) instead oil — and reflecting 
all the indices, allows us to find i++ with < i++ — z+ < log'^^^^ n such that 

KMn) - KMn+i) < 2'"^— ^Llog"-«3n. 

Summing, we have 

(103) A,;+^ (A„) - X,__{An) < L{1 + 2"a log°-«4 n) 

so by (1371), dig 



where a :— 1. Note that (1 + — «-) = H+ — i < log iV, and 



(l + a)log°-=W(j^_,_)log««JV 

Combining this with (|103p . (IM)) we conclude that 

l + 2"alog°-«^n> 2"(l + a)'°g'"^ 

and hence 

l + alog°-«^n> + 

But this contradicts the elementary estimate (l + a)^ > 1 + xa for a > and x > 1, 
and Lemma [ST] follows. 



7. Bad events are rare 



We now prove Proposition Let the notation and assumptions be as in that 
proposition. 

We first prove (a). The truncation assumption (l27l) ensures that the events (iii), 
(v), (vi) from Proposition 1511 are empty for n large enough. The event (i) fails 
with overwhelming probability thanks to Theorem 1601 The event (iv) fails with 
overwhelming probability because of the well-known fact that the operator norm of 
An is 0(n) with overwhelming probability (see e.g. [T]; there are many proofs, for 
instance one can start by observing that ||An||op < 2sup2, ||A„a;|j, where x ranges 
over a 1 /2-net of the unit ball, and use the union bound followed by a standard 
concentration of measure result, such as the Chernoff inequality). This concludes 
the proof of (a). 
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Now we prove (b) and (c) jointly. By (|27p and Proposition |62] we can find C such 
that all the coefficients of the eigenvectors Uj{An) for en/2 < j < (f — £/2)n are of 
magnitude at most n^^/^log*^ n with overwhelming probability. 

Let us first consider (vii) , in which we will be able to obtain the better upper bound 
Q-f 2-nm2-'2Cin -j-j-^g conditional probability of occurrence (thus establishing (b) 
and (c) simultaneously for (vii)). If 2™ > log*^^ n for some sufficiently large C3, 
then the desired bound comes from (P71) and Lemma 1321 (In fact, the Chernoff 
bound would suffice as well, and the event fails with overwhelming probability.) 
Now suppose instead that 2™ < log'^*-"'^^ n. We wish to show that 

(104) PdS*,! > 2™/2log°-^n) < 2-""log"2'^^ n, 
where Si £ C is the random walk 

(105) Si := Cl,n+lWi,l + . . . + Cn,n+lWi^n 

and Wi^i, . . . yWi^n are the coefficients of Ui{An), which by hypothesis have magni- 
tude 0(71^^/^ log*^ n) and square-sum to 1. 

Observe that Si has mean zero and variance 1 . Applying Theorem |44] and ([27| , 
we conclude that 



P{\S^\ > t) < exp(-ci') + n-^/'log 



i/2i^„.o(i) 



n 



for any t > 1 and some absolute constant c > 0, which easily yields ()104p in the 
range 2™ < log°^^^ n. 

The consideration of (ii) is similar. Write the left-hand side of (|42p as ||7r^f (X„)||, 
where H is the span of the Uj{An) for i- < j < i+. Applying p7)) and Lemma[43lwe 
obtain the claim when — i_ > log*^^ n for sufficiently large C3 (in fact (ii) now fails 
with overwhelming probability), so we may assume instead that z+ — i_ < log*^^^' n. 
In this case, the event (ii) can now be expressed as 

(106) \S\< (*+"*-)'^' 



2m/4 logO OOS ^ 

where S E C^+^^- is the random vector with components Sj defined in ()105p . 

From the orthonormality of the Ui{An), we see that S has mean zero and has 
covariance matrix equal to the identity. Applying Theorem again, we see that 

F{\S\ < t) « 0{t/{i+ - i_)l/2)(^+~^-)/4 + ^-1/2^-3 iQgO(l) ^ 

Applying this with t := 2'"% lo'g"'*""-'' n ^""^"^ using the fact that — ^ > C2 by 
hypothesis, one concludes that (|106l) occurs with probability 

« 0(2"/* log"-0"^ n)-^^/4 ^ „-l/223m/4 i^gO(l) ^ 

which proves the claim as long as C2 is large and 2™ < But the case 

2™ > ri^/^°° then follows by noting that the probability of the event (jlOSp is non- 
increasing in m. The proof of Proposition [53] is now complete. 
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Appendix A. Concentration of determinant 



In view of the standard identity 

r2 



j ^og\y\psc{y) dy ^ 



(which can be verified for instance by applying contour integration to a branch cut 
of (4 — z^)^/^ logz around the sht [—2, 2]; see also Remark [571 below) and Stirling's 
formula, it suffices to prove the latter claim. 

Fix z; we allow implied constants to depend on z. We of course have 

n 

log I det(A/„ - ^/^zI)\ ^ log \\j{Mn) ~ ^f^z\ 

J=l 

1 " 

= -n log n + ^ log I Aj (W„ - z) I 

i=i 

so it will sufiice to show that 

- log |Aj(W„) - z\- log \y - z\psc{y) dy 

^ , 1 J -2 



J = l 



< n 



asymptotically almost surely for some c > 0. Making the change of variables 
y = t{x), where t is defined in ([3]), it suffices to show that 



n 

-^log|A,(W-„) 



log \t(x) — z\ dx 



< n 



3 = 1 

asymptotically almost surely for some c > 0. 

From Theorem [TT] (with fc = 1) we have infj |Aj(W„) — z\ > (say) asymp- 
totically almost surely (because the expected number of eigenvalues in the interval 
[z — ri~^, z + n~^] is o(l)). From this (and (|4])) we conclude that sup^ log \Xj {Wn) — 
z\ = O(logn) asymptotically almost surely. Thus, the contribution of all j with 
\t (^) — Rez| < will be negligible for any fixed e > 0, and it suffices to show 
that 



E 



log \X,iWn) -Z\- log \t{x) - Z\ dx 



< n 



l<J<n:| (i)-Rcz|>Ti-E 

asymptotically almost surely. 



By © we see that with probability 1 - o(l), one has \]{Wn) = Hn) + 0{n^^) 
for all 1 < j < n and some absolute constant 5 > 0, where —2 < t{a) < 2 is defined 
by ([3]). By Taylor expansion, we thus have asymptotically almost surely that 



log|A,(W^„)-z| =log 



0(n 
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(say) for all 1 < j < n with \t — Rez| > ^, if e is chosen sufficiently small 
depending on S. The claim then follows (for e small enough) by approximating 
/q log \t{x) — z\ dx by its Riemann integral away from the possible singularity at 
Rex. 

Remark 67. The logarithmic potential log \y — z\psc{y) dy for the semicircular 
distribution can be computed explicitly as 

1 z — \J z'^ — 4 
/ '^Q%\V- APsc{v) dy= -Re , „ + log 

where Vz-^ — 4 is the branch of the square root of — 4 with cut at [—2, 2] which 
is asymptotic to z at infinity; this can be seen by integrating the formula 

(107) r ^Psc(y) dy = \{~z + ^fl^^) 

for the Stieltjes formula for the semicircular potential, which can be easily verified 
by the Cauchy integral formula. 

Appendix B. The distance between a random vector and a subspace 

The prupose of this appendix is to prove Lemma 1431 We restate this lemma for 
the reader's convenience. 

Lemma 68. Let X = {^iT--,Cn) £ C" be a random vector whose entries are 
independent with mean zero, variance 1, and are bounded in magnitude by K almost 
surely for some K, where K > 10(E|^|'* + 1). Let H be a subspace of dimension d 
and tth the orthogonal projection onto H . Then 

P(|||7r^(X)|| --Vd\>t)< 10exp(-^). 
In particular, one has 

\\nHiX)\\ =Vd + 0{K\ogn) 
with overwhelming probability. 

It is easy to show that E||7r_f/(X)|p = d, so it is indeed natural to expect that with 
high probability 7r//(X) is around ^/d. 

In a previous paper |44| , the authors proved Lemma [55] for the special case when 
are Bernoulli random variables (taking value ±1 with probability half). This 
proof is a simple generalization of one in [44] (see also [46l Appendix E]). We use 
the following theorem, which is a consequence of Talagrand's inequality (see [46} 
Theorem E.2] or [311133!). 

Theorem 69 (Talagrand's inequality). Let D be the unit disk {z e C, \z\ < 1}. For 
every product probability fi on D", every convex 1-Lipschitz function F : C" — > M., 
and every r > 0, 

p{\F - M{F)\ >r)< 4exp(-rVl6), 
where M{F) denotes the median of F. 



y/z^ - 4 + Z 

2 



58 



TERENCE TAO AND VAN VU 



Remark 70. In fact, the results still holds for the space Di x • • • x D„ where 
are complex region with diameter 2. 



An easy change of variables reveals the following generalization of this inequality: 
if /i is supported on a dilate K ■ D" of the unit disk for some K > rather than 
D" itself, then for every r > we have 



(108) 



li{\F - M{F)\ >r)< 4exp(-rVl6X2). 



In what follows, we assume that K > g{n), where g{n) is tending (arbitrarily 
slowly) to infinity with n. The map X — > |7r/f (X)| is clearly convex and 1-Lipschitz. 
Applying (|108p we conclude that 

(109) P(|kH(^)| - M{\7rH{X))\\ >t)< 4exp(-tVl6i^2))^ 
for any t > 0. To conclude the proof, it suffices to show that 

(110) M{\7rHiX)\- Vd <2K 



Let P = {pjk)i<j,k<n be the nx n orthogonal projection matrix onto H. We have 
that trace P'^ = trace P = J^iPa — ^^-^ \pii\ < 1. Furthermore, 



i—1 1^*7^J5;^1 



l<i,i<n 



Consider the event that |7rH(^)| > Vrf + 2/^. Since this implies |7r^f(A")p > 
d + A\/dK + AK'^, we have 



?(£+) < P(^p,,|C.p >d + 2VdK) + P(| Y P-Mj\ > 2VdK). 

l<i^j<n 



Let Si :— J2"^i Pii{\^i\'^ — 1), then we have by Chebyshev's inequality 



P(X^K»IC»I' >d + 2VdK) < P{\Si\ > 2VdK) < 



i=l 



On the other hand, by the assumption on K 



Thus, 



P(|5i|>2Vrfif)<^M) <^)<1/10. 
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Similarly, set S2 ■= \ Y.i^jPijCi^\- We have ES^ = Y.i^j - ^° ^Sain by 

Chebyshev's inequality 

P(52 > 2VdK) < ^ < 1/10. 

It follows that P{£+) < 1/5 and so M{\\'Kh{X)\\) <Vd+2K. To prove the lower 
bound, let be the event that ||7r//(X)|| < Vrf — 2K and notice that 

P{£-) < P{\tth{X)\^ <d-2VdK) < P{Si < d - yfdK) + P {S2 > VdK). 

Both terms on the RHS can be bounded by 1/5 by the same argument as above. 
The proof is complete. 



Appendix C. Controlling the spectral density by the Stieltjes 

transform 

In this appendix we establish Lemma [53] and Proposition 1661 

Proof. (Proof of Lemma[64l) From (j90| we see that with overwhelming probability, 
one has 

|s„(x + < 1 

for all — L < X < L which are multiples of (say). From (|89|) one concludes 

that 

n^772 + |A.(W„)-a;|2 ^ 
and we thus have the crude bound 

Ni <C rjn 

whenever / C [—L, L] is an interval of length |/| = ?/. Summing in /, we thus obtain 
the bound 

(111) Ni^\I\n 

with overwhelming probability whenever / C [—L, L] has length |/| > rj. (One 
could also invoke Proposition 1551 for this step.) 

Next, let / C [—L + e, L — e] be such that |/| > 277, and consider the function 
where the sum ranges over a\\ x £ I that are multiples of n~^'^^. Observe that 

n 

-VF(A.(I^„)) = n-™-Im V sn{x + ^r,) 

n ^ — ' TT -"^ — ' 

and 

f F{y)p,,{y) dy^n-'"°-lm V s{x + V^r^). 
In TT — 
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With overwhelming probabihty, we have Sn{x + ^/—Irj) ~ s{x + 177) + 0{S) for 
all X in the sum by hypothesis, and hence 

1 " r 

-J2F{\{Wn))^ / Fiy)p,,iy)dy + Oi\I\S). 
Jr 



On the other hand, from Riemann integration one sees that 

+ \y- 

(say). One can then establish the pointwise bounds 

1 



^(^)« l + (dist(.,I)/.) +-'" 
when y ^ I and dist(2/, /) < |/|, 

F(v) <C — + n-^° 

^^^^<<dist(y,/)2+" 

when y ^ I and dist{y,I) > |/|, and (since ^(^^■i^^^^y_^^i^ has integral 1) in the 
remaining case 

Using these bounds one sees that 

F{y)Psc{y) dy^J^ P^'^iy) dy + O ^77 log 1^ 

and a similar argument using Riemann integration and (as well as the trivial 

bound Nj < n when J lies outside [—L, L]) gives 



1 " 1 / I/I 

- Vf(A,(W„)) - -iV, + 0, UlogU 



i=i ^ ' 



Putting all this together, we conclude that 

Ni^nJ^Psciy) dy + Oe{Sn\I\) + Oe (^r^nlog^-^ 

The latter error term can be absorbed into the former since |/| > log j-, and the 
claim follows. □ 



Proof. (Proof of Lemma [66]) By the union bound it suffices to show this for |/| = 
77 := ^ " ■ Let X be the center of /, then by ((89)) it suffices to show that the 
event that 

(112) Ni > Cnri 
and 

(113) Ims„(a; + ^f^ri) > C 

for some large absolute constant C, fails with overwhelming probability. 
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Suppose that we have both (|112|) and (I113|) . By (|92|) we have 
1 " 1 

using the crude bound |Inii| < jj^^, we conclude 

1 " 1 

|r? + Inirfc| " ' 

On the other hand, by writing Wn^u in terms of an orthonormal basis Uj{Wn.k) of 
eigenfunctions, one sees that 

(114) Y^Jy. 



and hence 

""^ |uj(VF„,fc)*afe|2 



ry2 + (A,(W„^,)-x)2- 

On the other hand, from \\Y1\ we can find 1 < i_ < i+ < n with ij^ ~ > rjn 
such that Xi{Wn) G / for all i- < i < i+; hy the Cauchy interlacing property (|24|) 
we thus have Ai(W„,fc) e / for < i < i+. We conclude that 

Imrfe>i V \uj{Wn,kyak\^ = -\\PH,ak\\^ 

where Pff^ is the orthogonal projection to the -dimensional space spanned 

by the eigenvectors Uj{yVn.k) for i- < j < i+- Putting all this together, we conclude 
that 

1 " 7] 

n ri^ + \\PH,ak\\^ ^ ^' 

On the other hand, from Lemma 1551 we see that H-PfffcOfclP = 0{ri) with overwhelm- 
ing probability. (One has to take the union bound over all possible choices of 
but there are only 0{tt?) such choices at most, so this is not a problem.) The claim 
then follows by taking C sufficiently large. □ 



Appendix D. A multidimensional Berry-Esseen theorem 



In this section, we prove Theorem l44l We will need the following multidimensional 
Berry-Esseen theorem, which is a generalisation of [46, Proposition D.2]. 

Theorem 71. Let N,n > 1 be integers, letvi, . . . ,Vn G C"'^ be vectors, /e< ^i, . . . , ^„ 
be independent complex-valued variables with mean zero, variance E|^jp ~ \, and 
the third moment bound 

(115) sup E|0|' < C 

l<i<n 

for some constant C > 1. Let S be the C'^ -valued random variable 

n 

S := y^i;»c»- 
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We identify with in the usual manner, and define the covariance matrix 
M of S to be the unique symmetric 2N x 2N real matrix such that 

(116) u*Mu := E|Reu*S'p 

for all ueC^ = R2Af. 

Let G be a gaussian random variable on = with mean zero and with the 
same covariance matrix M as S, thus 

u*Mu = EIG • up = ElReu*^! 

for all u G M^" = C" (where u ■ v — Kev*u denotes the dot product on R^^j. More 
explicitly, G has the distribution function 

1 



YTTT exp(— x*Af dxi . . .dx2N 



(117) P(5 e 17) < P(G e l^u^eO) + 0(CA^3/2^-" ^ 



(27r)"(detM)i 

if M is invertible, with an analogous limiting formula when M is singular. Then 
for any e > and any measurable set VI C R.'^^ = C^, one has 

l<j<n 

and similarly 

(118) P(5' en)> P(G e n\d,n) - 0{GN^/^e-^ ^ \vj 

i<j<« 

where 

deft := {a; £ M^^ : distoo(a;, dil) < e}, 

dfl is the topological boundary of Q, and distoo is the distance with respect to the 
£°° metric on R^^. 

Remark 72. The main novelty here, compared with that in [46j Proposition D.2], 
is that the random variable Cj is not assumed to be C-normalized (which means 
that the real and imaginary parts of Q have covariance matrix equal to half the 
identity.) For instance, some of the Q could be purely real, or supported on some 
other line through the origin, such as the imaginary axis. 



Proof. We obtain the result by repeating the proof of [461 Proposition D.2] with 
some proper modification. For the readers convenience, we present all details. 

It suffices to prove (|117p . as (|118l) follows by replacing il with its complement. 

Let -0 : R ^ be a bump function supported on the unit ball {x S R : |x| < 1} 
of total mass J^^p — I, let ^'^^at : R+ be the approximation to the identity 

and let / : R" R+ be the convolution 

(119) fix) = I ■^e.N{y)ln{x-y) dy 
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where In is the indicator function of fl. Observe that / equals 1 on Q\de^, vanishes 
outside of $7 U cJefi, and is smoothly varying between and 1 on c^efi. Thus it will 
suffice to show that 

|E(/(5))-E(/(G))|«CiV3/2£-3( ^ |„^.|3). 

l<j<n 

We now use a Lindeberg replacement trick (cf. [32l |40]). For each 1 < i < n, let 
gi be a complex gaussian with mean zero and with the same covariance matrix as 
Ci, thus 

ERe(gO' = ERe(G)'; EIm(5,)' = EIm(C,)'; EReig,)lm{g,) = ERe(G)Im(C.). 

In particular gt has mean zero and variance 1. We construct the gi,. . . ,gn to be 
jointly independent. Observe from (|116p that the random variable 

givi + . . . + 5„u„ e 

has mean zero and covariance matrix M , and thus has the same distribution as G. 
Thus if we define the random variables 

Sj := Ci«i + . . . + CjVj + gj+iVj + . . . + gnVn e 

we have the telescoping triangle inequality 

n 

(120) |E(/(5)) - E(/(G))| < \Ef{S,) - E/(5,_i)|. 

For each 1 < j < n, we may write 

Sj = + CjVj\Sj^i = S'j + gjVj 

where 

S'j Cl^'l + • ■ ■ + Cj-lV'j-l + gj + lVj + . . . + gnVn- 

By Taylor's theorem with remainder we thus have 

3 

(121) f{S,) = P5^(ReO,ImO) + 0(|C,f sup J] \{vj ■ V)'=(V^«, • Vf-''f{x)\) 
and 

3 

(122) /(5,_i) = P5^(Reg„Im5,) + 0(|5,f snp Y.\{v, ■W)'' {y^v, ■Wf-'^ f{x)\) 

where Ps'. is some quadratic polynomial depending on S'j, and Vj and ^/—lvj are 
viewed as vectors in R^^. A computation using (|119p and the Leibniz rule reveals 
that all third partial derivatives of / have magnitude 0(£~^), and so by Cauchy- 
Schwarz we have 

^ |(«, • V)Hv^V, ■ Vr-'f{x)\) « |«,f iV3/2£-3. 
k=0 

Observe that Cj, gj are independent of Sj, and have the same mean and covari- 
ance matrix. Subtracting (|12ip from (|122l) and taking expectations using (IllSp we 
conclude that 

|E(/(5,)) - WiS,-i))\ « C\v,\'N^^h-' 
and the claim follows from p20p . □ 
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Remark 73. The bounds here are not best possible, but are sufficient for our 
appUcations. 



Now we are ready to prove Theorem 



Proof. (Proof of Theorem |44| We first prove the upper tail bound on Si. Here, the 
main tool is the = 1 case of Theorem [7TJ The variance of St is 

n 

(123) E|5,|2 = ^|a,,,f = 1 

since the rows of A have unit size. Thus, the 2x2 covariance matrix of Si is 0{1). 
Let Gi be a complex gaussian with mean zero and the same covariance matrix as 
Si. By Theorem [7T1 we have 

N n 

V{\SA >t)< P(|G| >t-V2e) + 0{Ce-' ^ ^ |a,,f ) 

1=1 j=i 

for any e > 0. Selecting e :— (say), and using the fact that G has variance 1, we 
conclude that 

N 71 

Pd^.l >0 <exp(-ct2)+0(C^^|a,,,f ), 

1=1 j=i 

and the claim follows from and (|123p . 

Now we prove the lower tail bound on 5*, using Theorem [71] in full generality. 
Observe that for any unit vector u e = M."^^ , one has 

E,\u*S\^ ^'E\\u*Af = 1 

by the orthonormality of the rows of A. Thus, by (|116p . the operator norm of the 
covariance matrix M of S has operator norm at most 1. On the other hand, we 
have 

traceM = E|5p = \\A\\% = N. 

Thus, the 2N eigenvalues of M range between and 1 and add up to N. This 
implies that at least N/2 of them are at least 1/4, and so one can find an [N/2\- 
dimensional real subspace V of M.^^ such that M is invariant on V and has all 
eigenvalues at least 1/4 on 

Now let G be a gaussian in = with mean zero and covariance matrix M. 
By Theorem [711 we have 

N n 

P{\S\ <t)< P{\G\ < t + ^/2Ne) + 0{GN^'^e~^ ^ ^ W 
for any e > 0. By ([Ml), ([1231) we have 

N n 

1=1 j=l 
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Setting e :— t/v2N, we conclude that 

P{\S\ <t)< P{\G\ < 2t) + 0{CNH-^a). 
Let Gv be the orthogonal projection of G to V. Clearly 

P(|G| < 2t) < Pi\Gv\ < 2t). 

The gaussian Gy has mean zero and covariance matrix at least jly (i-e. all 
eigenvalues are at least 1/4). By applying a linear transformation to reduce the 
covariance, we see that the quantity PdGy | < 2t) is maximized when the covariance 
matrix exactly \lv- Thus, in any orthonormal basis of Gy , the \_N/2\ gi, . . . , gyN/2\ 
components of Gy are independent real gaussians of variance 1/4. If |Gy| < 2t, 
then gi + . . . + 5^jv/2j — 4*^' ^^^^ (^^ Markov's inequality) gf < 8t'^/N for 
at least [Af/4J of the indices i. The number of choices of these indices is at most 
2L^/^J, and the events gf < 2t^/N are independent and occur with probability 
0{t/^/N), so we conclude from the union bound that 

P(|Gy| < 2t) < 0(VV7V)L^/4J 

and the claim follows. □ 
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